Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ceph support #380

Merged
merged 3 commits into from Oct 5, 2015
Merged

Ceph support #380

merged 3 commits into from Oct 5, 2015

Conversation

git-harry
Copy link
Contributor

This adds support for Ceph, including:

  • deploying a Ceph cluster
  • integrating with openstack-ansible Ceph support
  • configuring monitoring and logging

@git-harry
Copy link
Contributor Author

@nrb - the travis build appears to have failed because it is testing the third-party roles that are added as submodules by this pull request. What needs to be done to exclude them from the testing?

@@ -49,7 +72,8 @@ fi
which openstack-ansible || ./scripts/bootstrap-ansible.sh

# ensure all needed passwords and tokens are generated
./scripts/pw-token-gen.py --file /etc/openstack_deploy/user_extras_secrets.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bootstrap does user_secrets I thought, so the user_extras_secrets afterwords was intentional? So changing this back might be patch/merge snafu?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claco line 76 './scripts/pw-token-gen.py --file $RPCD_SECRETS' does the same thing as the line that was removed. pw-token-gen.py is only run by the bootstrap-aio.sh and run-upgrade.sh scripts in openstack-ansible.

@git-harry git-harry force-pushed the ceph-squash branch 2 times, most recently from 037f3e1 to 571fab5 Compare August 28, 2015 13:46
@mattt416
Copy link
Contributor

Thanks @git-harry . We will need to bump osad's sha, however I guess we'll need to first wait for https://review.openstack.org/#/c/209537/ to merge and get backported?

@nrb
Copy link
Contributor

nrb commented Aug 28, 2015

I'm loathe to bump master's SHA until we get 11.0 actually released, but that's likely to be something for further discussion. In terms of excluding the external roles, we'll likely have to add a grep or find excluding those to the ansible-lint line.

Probably the syntax checking line, too.

@git-harry
Copy link
Contributor Author

@nrb - in the end we needed to delete the roles before ansible-lint runs - https://github.com/git-harry/rpc-openstack/blob/ceph-squash/scripts/linting.sh#L51 because the alternative was to exclude 5 playbooks.

@nrb
Copy link
Contributor

nrb commented Aug 28, 2015

Ok, that works too.

@mattt416
Copy link
Contributor

Looks like ansible-ceph-common sha will need to be bumped to include https://github.com/ceph/ansible-ceph-common/commit/92f9f72bf94d79a4c988fea7c2d7a2da19b4edc7.

@d34dh0r53
Copy link
Contributor

Hey guys,

I'm getting the following error during the ceph-client | Install ceph packages play:

failed: [d34d-test1_glance_container-6879f982] => (item=({'client': [u'glance'], 'component': 'glance_api', 'service': ['glance-api']}, 'python-ceph')) => {"attempts": 5, "failed": true, "item": [{"client": ["glance"], "component": "glance_api", "service": ["glance-api"]}, "python-ceph"]}
stderr: E: There are problems and -y was used without --force-yes

stdout: Reading package lists...
Building dependency tree...
Reading state information...
The following extra packages will be installed:
  libboost-system1.54.0 libboost-thread1.54.0 libcephfs1 liblttng-ust-ctl2
  liblttng-ust0 libnspr4 libnss3 libnss3-nssdb librados2 librbd1 liburcu1
  python-cephfs python-rados python-rbd
The following NEW packages will be installed:
  libboost-system1.54.0 libboost-thread1.54.0 libcephfs1 liblttng-ust-ctl2
  liblttng-ust0 libnspr4 libnss3 libnss3-nssdb librados2 librbd1 liburcu1
  python-ceph python-cephfs python-rados python-rbd
0 upgraded, 15 newly installed, 0 to remove and 0 not upgraded.
Need to get 11.9 MB of archives.
After this operation, 30.0 MB of additional disk space will be used.
WARNING: The following packages cannot be authenticated!
  libcephfs1 librados2 librbd1 python-rados python-rbd python-cephfs
  python-ceph

msg: Task failed as maximum retries was encountered

Probably needs something like this:

- apt-key: url=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc state=present

but not sure which play needs that.

@git-harry
Copy link
Contributor Author

@d34dh0r53 - I think this is the upstream issue that will be addressed by https://review.openstack.org/#/c/228894/

@d34dh0r53
Copy link
Contributor

@git-harry cherry picking that patch did the trick but now I'm running into another issue where one of my ceph_osd containers is creating a journal file that is filling up the disk:

root@d34d-test1_ceph_osd_container-29bb9dc4:/var/lib/ceph/osd/mydir1# ls -al
total 4024896
drwxr-xr-x  3 root root       4096 Sep 30 16:30 .
drwxr-xr-x  3 root root       4096 Sep 30 16:30 ..
-rw-r--r--  1 root root        580 Sep 30 16:30 activate.monmap
-rw-r--r--  1 root root          3 Sep 30 16:30 active
-rw-r--r--  1 root root         37 Sep 30 16:30 ceph_fsid
drwxr-xr-x 74 root root       4096 Sep 30 16:30 current
-rw-r--r--  1 root root          0 Sep 30 16:30 fiemap_test
-rw-r--r--  1 root root         37 Sep 30 16:30 fsid
-rw-r--r--  1 root root 5368709120 Sep 30 16:30 journal
-rw-------  1 root root         56 Sep 30 16:30 keyring
-rw-r--r--  1 root root         21 Sep 30 16:30 magic
-rw-r--r--  1 root root          6 Sep 30 16:30 ready
-rw-r--r--  1 root root          4 Sep 30 16:30 store_version
-rw-r--r--  1 root root         53 Sep 30 16:30 superblock
-rw-r--r--  1 root root          0 Sep 30 16:30 upstart
-rw-r--r--  1 root root          2 Sep 30 16:30 whoami

This may be related to being on an AIO, are there configuration parameters that need to be adjusted for an AIO? The one setting I found in ceph-common for journal size was already at 0 (which may mean unlimited, not sure).

Thanks

@git-harry
Copy link
Contributor Author

@d34dh0r53 - when building an AIO some of the variables do get updated/added. The journal size is being set to 5120 when on an AIO [1]. It looks like you are using a flavour with an ephemeral disk of at least 250 GB, when that is the case the OSA scripts put the containers into logical volumes. All prior testing had been done with smaller instances or with dedicated OSD hosts.

[1] - https://github.com/rcbops/rpc-openstack/pull/380/files#diff-93518fda10b0403d3c5c20b4df4740a6R54

@mattt416
Copy link
Contributor

mattt416 commented Oct 1, 2015

@d34dh0r53 , what @git-harry said. I've always built this on an AIO where LVM isn't used for containers, so never hit this. I can see it being a problem though since our container_size is limited to 5G. Should we just decrease the journal size to 1G so that this should work irrespective of AIO instance? Also @git-harry, can you please update scripts/deploy.sh to overwrite journal_size and raw_multi_journal rather than append? Or shall we handle this in a subsequent patch?

@prometheanfire
Copy link
Contributor

does this need to be squashed?

@git-harry
Copy link
Contributor Author

@prometheanfire - I think the first three commits should be kept separate. The work logically broke along those lines so I thought it made things easier to view. The rest of the commits are recent changes that I didn't squash down to make it clear that new changes had been introduced. The pull request has been around for so long I didn't want anyone who'd previously reviewed it to think any changes in the SHAs were just due to rebasing. If you guys would prefer it squashed, I don't have an issue with reducing it down to three commits.

@mattt416
Copy link
Contributor

mattt416 commented Oct 2, 2015

I like the idea of squashing it into 3 commits. @git-harry do you want to include #430 into this PR since it's documentation that probably should accompany the work, or prefer to leave it as is in a separate commit?

@git-harry
Copy link
Contributor Author

@mattt416 I ripped out the Ceph-related bits from #430 and included them here. I've also squashed everything back down to the original three commits.

@mattt416
Copy link
Contributor

mattt416 commented Oct 2, 2015

👍

@prometheanfire
Copy link
Contributor

cool, lgtm 👍 but needs a rebase

@d34dh0r53
Copy link
Contributor

👍

@d34dh0r53 d34dh0r53 added this to the r11.1.0rc0 milestone Oct 2, 2015
mattt416 and others added 3 commits October 5, 2015 08:53
This commit adds support for deploying a Ceph cluster using
rpc-openstack as well setting the appropriate configuration for
integration with os-ansible-deployment's Ceph capabilities.

This commit makes use of the following roles to deploy a Ceph cluster:

https://github.com/ceph/ansible-ceph-common
https://github.com/ceph/ansible-ceph-mon
https://github.com/ceph/ansible-ceph-osd

The roles are submodules and so should get automatically cloned at the
same time as os-ansible-deployment.

The default configuration is designed to deploy three mon containers,
one on each controller, with separate physical hosts for OSDs.

The Ceph pools and users required by OpenStack are automatically created
as part of the setup.

Ceph can be deployed on a AIO, when this is done three OSD containers
are created in an attempt to provide a more realistic representation of
a Ceph cluster.

The rbd pool is created automatically when a new Ceph cluster is
deployed. This pool is not used by an OpenStack deployment but does
consume pgs and so is removed if it exists and is empty.

This commit sets the following in user_extras_variables.yml:

pool_default_size: 3
pool_default_min_size: 2
mon_osd_full_ratio: .90
mon_osd_nearfull_ratio: .80
raw_multi_journal: true
journal_size: 80000
secure_cluster: true
secure_cluster_flags:
  - nodelete

The role defaults are:

pool_default_size: 2
pool_default_min_size: 1
mon_osd_full_ratio: .95
mon_osd_nearfull_ratio: .85
raw_multi_journal: false
journal_size: 0
secure_cluster: false
secure_cluster_flags:
  - nopgchange
  - nodelete
  - nosizechange

The Ceph playbooks have been excluded from ansible-lint because their
inclusion causes the third-party Ceph roles to be tested and they fail
the ansible-lint checks.

Co-Authored-By: git-harry <git-harry@live.co.uk>
This commit adds the following functionality:

 - A new plugin for use in monitoring a ceph cluster
 - An Ansible library for gathering OSD-host facts
 - Updates to the setup-maas.yml playbook and rpc_maas role to deploy
   and configure the ceph_monitoring.py plugin.

Co-Authored-By: Matt Thompson <mattt@defunct.ca>
Co-Authored-By: Hugh Saunders <hugh@wherenow.org>
Add logstash template file for ceph.
Add a beaver ceph.conf file for ceph containers and hosts.

Due to the permissions on /var/log/ceph.log and /var/log/ceph-audit.log
it isn't possible to capture these at this point, and Ceph offers no
otpions to adjust this, so we will need an alternative solution for
this. This applies to the mons only.
@mancdaz
Copy link
Contributor

mancdaz commented Oct 5, 2015

Good work peeps. I'm going to merge, so we can backport to kilo and pop some tags.

mancdaz added a commit that referenced this pull request Oct 5, 2015
@mancdaz mancdaz merged commit 79a472b into rcbops:master Oct 5, 2015
claco pushed a commit to claco/rpc-openstack that referenced this pull request Oct 5, 2015
Ceph support
(cherry picked from commit 79a472b)
@git-harry git-harry deleted the ceph-squash branch November 24, 2015 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants