Add monitoring for holland backups #1543

Merged
merged 1 commit into from Jan 4, 2017

Projects

None yet

6 participants

@BjoernT
Contributor
BjoernT commented Oct 25, 2016 edited

The MaaS plugin holland_local_check.py is added to
monitor the status of previous holland backups.
The plugin expects at least one completed backupset
for the previous day. Holland is running local Galera
Backups on each container once daily.
The metrics holland_backup_size (Mega Bytes) and
holland_backup_status (Boolean) are provided to
monitor holland.

Connects #1506

@rackernexus
Collaborator

👀 Preview

@rackernexus
Collaborator

HULK PREVIEW

@rackernexus
Collaborator

If you want to double-check your formatting, check the preview.

@BjoernT
Contributor
BjoernT commented Oct 26, 2016

recheck_all

@rackernexus
Collaborator

👀 Preview

@rackernexus
Collaborator

👀 Preview

@d34dh0r53
Contributor
19:21:33 failed: [jrpcaioiad-b43] => {"attempts": 5, "changed": false, "cmd": ". /openstack/venvs/maas-r13.1.0rc1/bin/activate\n /opt/rpc-openstack/scripts/rpc-maas-tool.py verify-local", "delta": "0:00:31.792523", "end": "2016-10-26 19:21:32.984147", "failed": true, "failed_when_result": true, "rc": 1, "start": "2016-10-26 19:21:01.191624", "stdout_lines": ["The following metrics are required by alarms but not produced by any checks: set(['holland_backup_status'])", "The following checks failed to execute or didn't return 'okay' as their status: [('holland_local_check--jrpcaioiad-b43-galera-container-ab52f086', 'status error Could not find Holland backup from 20161025\\n'), ('holland_local_check--jrpcaioiad-b43-galera-container-227f4241', 'status error Could not find Holland backup from 20161025\\n'), ('holland_local_check--jrpcaioiad-b43-galera-container-15dd5948', 'status error Could not find Holland backup from 20161025\\n')]"], "warnings": []}
19:21:33 stdout: The following metrics are required by alarms but not produced by any checks: set(['holland_backup_status'])
19:21:33 The following checks failed to execute or didn't return 'okay' as their status: [('holland_local_check--jrpcaioiad-b43-galera-container-ab52f086', 'status error Could not find Holland backup from 20161025\n'), ('holland_local_check--jrpcaioiad-b43-galera-container-227f4241', 'status error Could not find Holland backup from 20161025\n'), ('holland_local_check--jrpcaioiad-b43-galera-container-15dd5948', 'status error Could not find Holland backup from 20161025\n')]
19:21:33 
19:21:33 FATAL: all hosts have already failed -- aborting

I'm not sure that we do Holland backups in the gate, perhaps @hughsaunders can verify.

@BjoernT
Contributor
BjoernT commented Oct 31, 2016

@d34dh0r53 We can actually add a holland bk test case so we have backup during gating or we need to exclude this check from the verify-maas.yml

@rackernexus
Collaborator

👀 Preview

@BjoernT
Contributor
BjoernT commented Nov 1, 2016 edited

recheck_ceph

Died with

19:36:33 After this operation, 136 MB of additional disk space will be used.
19:36:33 WARNING: The following packages cannot be authenticated!
19:36:33   qpress
@BjoernT
Contributor
BjoernT commented Nov 2, 2016

recheck_all

@BjoernT
Contributor
BjoernT commented Nov 4, 2016

The issue is related to

W: GPG error: https://repo.percona.com trusty InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 9334A25F8507EFA5

possibly caused after updating the fingerprint for percona repo inside the galera server role

@cloader89
Contributor

@BjoernT The sha has been updated recently, a rebase couldn't hurt if you want to try, this error was happening 2 weeks ago.

@rackernexus
Collaborator

HULK PREVIEW

@BjoernT
Contributor
BjoernT commented Nov 9, 2016

recheck_all

@BjoernT
Contributor
BjoernT commented Nov 14, 2016

recheck_all

@rackernexus
Collaborator

Your content preview is now ready. 🙇

@rackernexus
Collaborator

HULK PREVIEW

@rackernexus
Collaborator

👀 Preview

@rackernexus
Collaborator

🔎 Preview

@rackernexus
Collaborator

If you want to double-check your formatting, check the preview.

@rackernexus
Collaborator

👀 Preview

@rackernexus
Collaborator

If you want to double-check your formatting, check the preview.

@BjoernT
Contributor
BjoernT commented Dec 7, 2016

recheck_ceph

Failed to validate the SSL certificate for monitoring.api.rackspacecloud.com:443

@BjoernT
Contributor
BjoernT commented Dec 8, 2016

recheck_all

@alextricity25
Contributor

This is blocked by some issues we are having with the maas account we use in our gating: rcbops/u-suk-dev#809

@BjoernT
Contributor
BjoernT commented Dec 8, 2016

@alextricity25 Thank I knew this already, I had informed the Maas team yesterday since the gate jobs of this PR ran into it virtually first

@BjoernT
Contributor
BjoernT commented Dec 8, 2016

recheck_all

@BjoernT
Contributor
BjoernT commented Dec 9, 2016

recheck_all

@rackernexus
Collaborator

👀 Preview

@rackernexus
Collaborator

Your content preview is now ready. 🙇

@rackernexus
Collaborator

🔎 Preview

@rackernexus
Collaborator

If you want to double-check your formatting, check the preview.

@BjoernT
Contributor
BjoernT commented Dec 15, 2016

recheck_swift

@BjoernT
Contributor
BjoernT commented Dec 27, 2016

recheck_all

@BjoernT BjoernT Add monitoring for holland backups
The MaaS plugin `holland_local_check.py` is added to
monitor the status of previous holland backups.
The plugin expects at least one completed backupset
for the previous day. Holland is running local Galera
Backups on each container once daily.
The metrics `holland_backup_size` (Mega Bytes) and
`holland_backup_status` (Boolean) are provided to
monitor holland.

Closes-Bug: #1506
c280bc0
@BjoernT
Contributor
BjoernT commented Jan 3, 2017

@alextricity25 Please review. I tested it in my hardware lab

@major
major approved these changes Jan 3, 2017 View changes

Nicely done on the skipped checks logic. LGTM.

@d34dh0r53

Looks good to me, thanks @BjoernT 👍

@d34dh0r53 d34dh0r53 merged commit b0f6348 into rcbops:master Jan 4, 2017

3 checks passed

ceph Build finished.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
swift Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment