Skip to content

Ceph cluster installation (jewel on Ubuntu xenial)

Marica Antonacci edited this page Oct 3, 2016 · 10 revisions

Ceph cluster installation (jewel)

This tutorial has been prepared for the course "Cloud Storage Solutions" organized by INFN. Participants are divided in groups of 2 persons.

Setup:

Each group can access 4 nodes:

  • 1 admin node: deploy, mon, mds
  • 3 osd nodes

The hostnames of the nodes include the group number, e.g. participants of group 1 will use the following nodes:

  • ceph-adm-1
  • ceph-node1-1
  • ceph-node2-1
  • ceph-node3-1

In the following sections, the variable GN refers to the group number.

Preparation

On each node

  • (skip this step if the nodes are registered in DNS) edit the file /etc/hosts:

    x.x.x.x ceph-adm-$GN 
   
    y.y.y.y ceph-node1-$GN
 
    z.z.z.z ceph-node2-$GN

    w.w.w.w ceph-node3-$GN   
  • create the user ceph-deploy and set the password:
sudo useradd -d /home/ceph-deploy -m ceph-deploy
sudo passwd ceph-deploy
  • provide full privileges to the user adding the following line to /etc/sudoers.d/ceph:
echo "ceph-deploy ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph

And change permissions in this way:

sudo chmod 0440 /etc/sudoers.d/ceph

On the admin node:

  • configure your admin node with password-less SSH access to each node running Ceph daemons (leave the passphrase empty). On your admin node node01, become ceph user and generate the ssh key:
# su - ceph-deploy
$ ssh-keygen -t rsa

You will have output like this:

Generating public/private rsa key pair.
Enter file in which to save the key (/home/ceph-deploy/.ssh/id_rsa): 
Created directory '/home/ceph-deploy/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/ceph-deploy/.ssh/id_rsa.
Your public key has been saved in /home/ceph-deploy/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:VKLZ4+bDjm07NuCtvyoHUItSE8tw9DFptYhXD3dkWrk ceph-deploy@ceph-adm-0
The key's randomart image is:
+---[RSA 2048]----+
|..+.oo+ o.*.     |
| +o++= O B.      |
| .+=+.+ *  .     |
|. o..  o .E      |
| . .    S        |
|    . .+         |
|     o o+        |
|    . o+*.       |
|     o+**=       |
+----[SHA256]-----+

Copy the key to each cluster node and test the password-less access:

ssh-copy-id ceph-deploy@ceph-node1-$GN

ssh-copy-id ceph-deploy@ceph-node2-$GN

ssh-copy-id ceph-deploy@ceph-node3-$GN

In case the remote login via password is disabled, copy paste the content of the file /home/ceph-deploy/.ssh/id_rsa.pub that resides on the admin node into the file /home/ceph-deploy/.ssh/authorized_keys of the other nodes (you should create the .ssh dir).


Note: Please note that Ubuntu 16.04.1 LTS isn't shipped with Python2.7. In order to install ceph using the tool ceph-deploy ensure that python2 is installed on all the nodes:

# sudo apt-get install -y python

Start installation

On the admin node:

wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -
echo deb http://download.ceph.com/debian-jewel/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list
sudo apt-get -qqy update && sudo apt-get install -qqy ntp ceph-deploy
mkdir cluster-ceph
cd cluster-ceph
ceph-deploy new ceph-adm-$GN

Add the following line in ceph.conf [global]:

osd pool default size = 2

Install ceph:

ceph-deploy install --release jewel ceph-adm-$GN ceph-node1-$GN ceph-node2-$GN ceph-node3-$GN

Add the initial monitor(s) and gather the keys:

ceph-deploy mon create-initial
ceph-deploy@ceph-adm-0:~/cluster-ceph$ ll
total 204
drwxrwxr-x 2 ceph-deploy ceph-deploy   4096 Sep 28 09:44 ./
drwxr-xr-x 4 ceph-deploy ceph-deploy   4096 Sep 28 09:20 ../
-rw------- 1 ceph-deploy ceph-deploy     71 Sep 28 09:44 ceph.bootstrap-mds.keyring
-rw------- 1 ceph-deploy ceph-deploy     71 Sep 28 09:44 ceph.bootstrap-osd.keyring
-rw------- 1 ceph-deploy ceph-deploy     71 Sep 28 09:44 ceph.bootstrap-rgw.keyring
-rw------- 1 ceph-deploy ceph-deploy     63 Sep 28 09:44 ceph.client.admin.keyring
-rw-rw-r-- 1 ceph-deploy ceph-deploy    227 Sep 28 09:20 ceph.conf
-rw-rw-r-- 1 ceph-deploy ceph-deploy 164928 Sep 28 09:44 ceph-deploy-ceph.log
-rw------- 1 ceph-deploy ceph-deploy     73 Sep 28 09:19 ceph.mon.keyring
-rw-r--r-- 1 root        root          1645 Oct 15  2015 release.asc

Check the disks that will be used for the OSDs:

ceph-deploy disk list ceph-node1-$GN
ceph-deploy disk list ceph-node2-$GN
ceph-deploy disk list ceph-node3-$GN

Create the OSDs on ceph-node1, ceph-node2 and ceph-node3 (vdc device):

ceph-deploy osd create ceph-node1-$GN:vdc ceph-node2-$GN:vdc ceph-node3-$GN:vdc 

Use ceph-deploy to copy the configuration file and admin key to your admin node and your Ceph Nodes so that you can use the ceph CLI without having to specify the monitor address and ceph.client.admin.keyring each time you execute a command:

ceph-deploy admin ceph-adm-$GN ceph-node1-$GN ceph-node2-$GN ceph-node3-$GN

Ensure that you have the correct permissions for the ceph.client.admin.keyring.

sudo chmod +r /etc/ceph/ceph.client.admin.keyring

Now you can check the status of your cluster using the following command:

ceph -s

The output should look like this:

ceph-deploy@ceph-adm-0:~/cluster-ceph$ ceph -s
    cluster 72766f71-c301-4352-8389-2fdfdee8333a
     health HEALTH_OK
     monmap e1: 1 mons at {ceph-adm-0=90.147.102.33:6789/0}
            election epoch 3, quorum 0 ceph-adm-0
     osdmap e14: 3 osds: 3 up, 3 in
            flags sortbitwise
      pgmap v46: 64 pgs, 1 pools, 0 bytes data, 0 objects
            100 MB used, 76658 MB / 76759 MB avail
                  64 active+clean

Check the cluster's hierarchy:

ceph osd tree

Example output:

ceph-deploy@ceph-adm-0:~/cluster-ceph$ ceph osd tree
ID WEIGHT  TYPE NAME             UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 0.07320 root default                                            
-2 0.02440     host ceph-node1-0                                   
 0 0.02440         osd.0              up  1.00000          1.00000 
-3 0.02440     host ceph-node2-0                                   
 1 0.02440         osd.1              up  1.00000          1.00000 
-4 0.02440     host ceph-node3-0                                   
 2 0.02440         osd.2              up  1.00000          1.00000 

Get information about the pools:

ceph osd dump |grep pool

Example output:

pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0

Add the metadata server:

ceph-deploy mds create ceph-adm-$GN

Check the mds status:

ceph mds stat

You should see something like that:

ceph-deploy@ceph-adm-0:~/cluster-ceph$ ceph mds stat
e2:, 1 up:standby

A Ceph filesystem requires at least two RADOS pools, one for data and one for metadata.

ceph osd pool create cephfs_data 64
ceph osd pool create cephfs_metadata 64

Once the pools are created, you may enable the filesystem using the fs new command:

ceph fs new cephfs cephfs_metadata cephfs_data

Once a filesystem has been created, the MDS will be able to enter an active state. Check again its status:

ceph mds stat

You will see the mds in creating state and then in active state like this:

ceph-deploy@ceph-adm-0:~/cluster-ceph$ ceph mds stat
e5: 1/1/1 up {0=ceph-adm-0=up:active}

Operating the cluster

To start all daemons:

systemctl start ceph.target 

Check the status of an osd:

systemctl status ceph-osd@12      # check status of osd.12

Start/Stop of an osd:

systemctl [start|stop] ceph-osd@12      # start/stop osd.12

Pools

List pools:

ceph osd lspools

Show pool’s utilization statistics, execute:

rados df

Set a value to a pool:

ceph osd pool set {pool-name} {key} {value}

Example: change the replica factor for pool

ceph osd pool set data size 2

Basic Usage

RADOS

Try to store one file into "data" pool (create it if it does not exist) using a command like this:

rados put {object-name} {file-path} --pool=data

Do this command to check if the file has been stored into the pool "data":

rados ls -p data

You can identify the object location with: ceph osd map {pool-name} {object-name}

RADOS BLOCK DEVICE (RBD)

We will use the rbd pool which was created during the cluster installation. Create a new image of size 1 GB:

rbd create --size 1024 testimg

List the rbd images:

rbd ls

Show detailed information on a specific image:

rbd info testimg

Note: Ceph block images are thin-provisioned: any actual physical storage is used until you begin saving data to them. To check how much space the rbd pool is using:

rados df

or

rbd du

You will get something like this:

warning: fast-diff map is not enabled for testimg. operation may be slow.
NAME    PROVISIONED   USED 
testimg       1024M 53248k 

Now let's map and mount the block device

sudo rbd map testimg

Note: Ceph Jewel, by default, creates new RBD images using features currently unsupported by krbd. You can disable these features on an existing image using the 'rbd feature disable' command or you can revert to pre-Jewel features for all new images by adding rbd default features = 1 in your Ceph configuration file.


View rbd mapped:

rbd showmapped

Create FS and mount :

root@ceph-adm-0:~# mkfs.xfs /dev/rbd0 
meta-data=/dev/rbd0              isize=512    agcount=9, agsize=31744 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=1024   swidth=1024 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
root@ceph-adm-0:~# 
root@ceph-adm-0:~# mkdir /mnt/myrbd
root@ceph-adm-0:~# mount /dev/rbd0 /mnt/myrbd/

Write some data:

root@ceph-adm-0:~# dd if=/dev/urandom of=/mnt/myrbd/file1 bs=1024 count=10000
10000+0 records in
10000+0 records out
10240000 bytes (10 MB, 9,8 MiB) copied, 0,716304 s, 14,3 MB/s

Check the used space:

root@ceph-adm-0:~# rbd du
warning: fast-diff map is not enabled for testimg. operation may be slow.
NAME     PROVISIONED   USED 
testimg        1024M 65536k 

Create a snapshot:

rbd snap create rbd/testimg@snap1

Check:

root@ceph-adm-0:~# rbd ls -l
NAME           SIZE PARENT FMT PROT LOCK 
testimg       1024M          2           
testimg@snap1 1024M          2           

root@ceph-adm-0:~# rbd du
warning: fast-diff map is not enabled for testimg. operation may be slow.
NAME          PROVISIONED   USED 
testimg@snap1       1024M 65536k 
testimg             1024M      0 
<TOTAL>             1024M 65536k 

Excercise: try to write something on the original block device, then map and mount the snapshot.

Check the content of the snapshot and the output of the command rbd du


Hint: to mount the snaphot you will use the following options -o ro,nouuid


CEPH FILESYSTEM

To mount ceph FS with FUSE, first install ceph-fuse with the command

sudo apt-get install ceph-fuse

and then mount it, running:

sudo ceph-fuse -m {monitor_hostname:6789} {mount_point_path}

Note1: You can get the IP of one of the Ceph monitors:

ceph mon stat

Note2: the mount_point_path must exist before you can mount the ceph filesystem. In our case the mountpoint is the directory /ceph-fs that we create with

sudo mkdir /ceph-fs

Restore procedure

If at any point you run into trouble and you want to start over, execute the following to purge the configuration:

ceph-deploy purgedata {ceph-node} [{ceph-node}]
ceph-deploy forgetkeys

To purge the Ceph packages too, you may also execute:

ceph-deploy purge {ceph-node} [{ceph-node}]

If you execute purge, you must re-install Ceph.