Ceilometer workload partitioning with tooz & redis #447

eglynn · 2015-01-07T20:42:17Z

The quickstack::pacemaker:ceilometer puppet class now allows
redis to be specified as the backend for tooz, to be used for
workload partitioning in the ceilometer central agent and
alarm evaluator.

We do not create a pacemaker resource for redis, as this will
instead be self-monitored via the redis-sentinel service (via
a subsequent patch, once the support for sentinel in tooz is
packaged python-tooz).

jguiditta · 2015-01-07T21:23:39Z

puppet/modules/quickstack/manifests/pacemaker/redis.pp

@@ -0,0 +1,14 @@
+class quickstack::pacemaker::redis(
+  $bind_host             = '127.0.0.1',


How will using 127.0.0.1 work in an HA setup? if we are not load blancing redis (similar to memcached), it seems like we would want to bind on either 0s or the map_params("local_bind_addr") value, otherwise each node would only get the cache from there instead of potentially elsewhere, if that is where the right copy is?

jguiditta · 2015-01-07T21:29:18Z

Aside from the couple comments I had, this looks pretty reasonable, thanks! If we want to try to make this into the next release, we need an RFE in bugzilla set against openstack-foreman-installer and version 6, then we have something to set QE on for testing once this gets in

eglynn · 2015-01-08T14:21:37Z

Thanks for the detailed feedback Jay, responding in one consolidated comment.

How will using 127.0.0.1 work in an HA setup?

That's just the default bind_host, which is always overridden in practice (IIUC).

I just followed what seemed to be the existing practice whereby the default is generally set to 127.0.0.1 in the quickstack puppet classes, but I'd be happy to change to 0.0.0.0 if that makes more sense.

if we are not load blancing redis (similar to memcached)

Redis is not quite as limited as memcached (where IIUC, there is no replication/synchronization at all between the memcached servers).

But yes, it would not make sense to LB over multiple Redis instances, as only one is deemed the master (the others are effectively passive slaves).

seems to me like this should be something more similar to how we
currently point at a list of memcached servers. Is there a reason we
don't want that?

The problem here is tooz (the library that ceilometer uses for coordination, which itself then interacts with redis) does not IIUC allow for an address list to be specified via a URL.

If so, why is the first one in the list of servers always correct([0])?

We need to pick one redis for now, so this is it.

Once openstack-puppet-modules is updated to accept the latest puppet-redis patches for redis-sentinel (pull requests [1], [2]) and we have a new version of python-tooz rebuilt, this will change as follows:

identify the redis-sentinel running on one of $_backend_ips[] as the "contact" sentinel that the ceilometer tooz/redis driver contacts in the first instance to determine the current master
identify the redis instance running on $_backend_ips[0] as the master via its redis.conf
identify the redis instances running on the other $_backend_ips[1:-1] as slaves via their redis.conf

The sentinel will then mediate slave->master promotion and failover when the old master goes away.

However, what I'm missing is how, in quickstack terms, to approach ensuring a different action is taken on one controller as opposed to another.

i.e. can you point me to an existing pattern where different parameters are passed to a particular puppet class on a subset of the controllers?

(in this case $slaveof being set for the ::redis class for all but one of the controllers).

we need an RFE in bugzilla set against openstack-foreman-installer and
version 6

See https://bugzilla.redhat.com/1180158

Cheers,
Eoghan

[1] redhat-openstack/openstack-puppet-modules#209
[2] redhat-openstack/openstack-puppet-modules#207

jguiditta · 2015-01-09T14:30:30Z

On 08/01/15 06:21 -0800, Eoghan Glynn wrote:

Thanks for the detailed feedback Jay, responding in one consolidated
comment.
How will using 127.0.0.1 work in an HA setup?
That's just the default bind_host, which is always overridden in
practice (IIUC).

I just followed what seemed to be the existing practice whereby the
default is generally set to 127.0.0.1 in the quickstack puppet classes,
but I'd be happy to change to 0.0.0.0 if that makes more sense.

That is true from the non-pacemaker classes, but in the pacemaker
classes, we determine the bind_addrs for all services based on
$local_bind_addr, which is defined in quickstack::pacemaker::params.
This ensures the service can be reached by other machines in the
cluster, and on the correct network.
if we are not load blancing redis (similar to memcached)
Redis is not quite as limited as memcached (where IIUC, there is no
replication/synchronization at all between the memcached servers).

But yes, it would not make sense to LB over multiple Redis instances,
as only one is deemed the master (the others are effectively passive
slaves).

So, is this then more like mongo, where redis elects its master? Or
do we have to explicitly tell it which node is master? The answer to
this will actually inform my responses to your other questions, so I
am going to hold on those for now.
seems to me like this should be something more similar to how we
currently point at a list of memcached servers. Is there a reason we
don't want that?
The problem here is tooz (the library that ceilometer uses for
coordination, which itself then interacts with redis) does not IIUC
allow for an address list to be specified via a URL.
If so, why is the first one in the list of servers always
correct([0])?
We need to pick one redis for now, so this is it.

Once openstack-puppet-modules is updated to accept the latest
puppet-redis patches for redis-sentinel (pull requests [1], [2]) and we
have a new version of python-tooz rebuilt, this will change as follows:

identify the redis-sentinel running on one of $_backend_ips[] as
the "contact" sentinel that the ceilometer tooz/redis driver
contacts in the first instance to determine the current master

identify the redis instance running on $_backend_ips[0] as the
master via its redis.conf

identify the redis instances running on the other
$_backend_ips[1:-1] as slaves via their redis.conf

Are these changes (both in opm and 'upcoming' here for OSP 6 GA as
well, or is this going to be an async tweak/improvement?
The sentinel will then mediate slave->master promotion and failover
when the old master goes away.

However, what I'm missing is how, in quickstack terms, to approach
ensuring a different action is taken on one controller as opposed to
another.

i.e. can you point me to an existing pattern where different parameters
are passed to a particular puppet class on a subset of the controllers?

(in this case $slaveof being set for the ::redis class for all but one
of the controllers).

I have a couple thoughts on this, but it depend on your answer to the
above question on how master is elected.
we need an RFE in bugzilla set against openstack-foreman-installer
and
version 6
See [1]https://bugzilla.redhat.com/1180158

Thanks!
Cheers,
Eoghan

[1] [2]redhat-openstack/openstack-puppet-modules#209
[2] [3]redhat-openstack/openstack-puppet-modules#207

—
Reply to this email directly or [4]view it on GitHub.

References

https://bugzilla.redhat.com/1180158

Automatic update :: advancing puppet-redis to latest master upstream openstack-puppet-modules#209

Automatic update openstack-puppet-modules#207

Ceilometer workload partitioning with tooz & redis #447 (comment)

jguiditta · 2015-01-09T14:34:00Z

Oh,and one general request on the commit message. Would you kindly updated it on the next revision to be of the form:

BZ #1180158: Ceilometer workload partitioning with tooz & redis

https://bugzilla.redhat.com/show_bug.cgi?id=1180158

<other text you already have>

It makes it easier for me to generate the rpm changelog. Thanks!

eglynn · 2015-01-09T17:04:41Z

Thanks again for the further feedback, comments inline ...

I just followed what seemed to be the existing practice whereby
the default is generally set to 127.0.0.1 in the quickstack puppet
classes, but I'd be happy to change to 0.0.0.0 if that makes more
sense.

That is true from the non-pacemaker classes, but in the pacemaker
classes, we determine the bind_addrs for all services based on
$local_bind_addr, which is defined in quickstack::pacemaker::params.
This ensures the service can be reached by other machines in the
cluster, and on the correct network.

A-ha, cool, got it now, I'll change that.

So, is this then more like mongo, where redis elects its master? Or
do we have to explicitly tell it which node is master? The answer
to this will actually inform my responses to your other questions,
so I am going to hold on those for now.

We have to explicitly tell the ::redis puppet class whether the service run on the current host is going to be the initial master (which is the default) or be a slave (by setting the $slave_of parameter to the ::redis class).

If that initial master fails, the redis-sentinel then decides which slave is promoted to be the new master.

If the original master fails, a slave is promoted in its place ... but then if the old master is restarted, the sentinels tell it revert to being a slave (regardless of its master-like redis.conf).

So it's somewhat less dynamic than the mongodb replicaset scenario, in the sense that the initial master is set in config, but things become more dynamic thereafter.

It may seem odd that mastership and slavedom are initially statically defined in config, but then dynamically managed thereafter by the sentinels ... IIUC this is due to the master->slave replication being developed first, and then the sentinel mechanism added later, layered over the pre-existing replication feature.

Once openstack-puppet-modules is updated to accept the latest
puppet-redis patches for redis-sentinel (pull requests [1], [2])
and we have a new version of python-tooz rebuilt, this will
change as follows:

identify the redis-sentinel running on one of $_backend_ips[]
as the "contact" sentinel that the ceilometer tooz/redis
driver contacts in the first instance to determine the
current master

identify the redis instance running on $_backend_ips[0] as
the master via its redis.conf

identify the redis instances running on the other
$_backend_ips[1:-1] as slaves via their redis.conf

Are these changes (both in opm and 'upcoming' here for OSP 6 GA as
well, or is this going to be an async tweak/improvement?

We're still trying to get the sentinel-related o-p-m changes into o-p-m with a view to getting them into OSP 6 GA.

Testing on the o-p-m pull-requests showed up a problem that cdent fixed earlier today, so we're awaiting feedback on the updated PRs.

(note that the current state of this pull-request does not require the sentinel changes, as I avoided depending on those puppet-redis changes prior to acceptance into o-p-m)

However, what I'm missing is how, in quickstack terms, to
approach ensuring a different action is taken on one controller
as opposed to another.

i.e. can you point me to an existing pattern where different
parameters are passed to a particular puppet class on a subset of
the controllers?

(in this case $slaveof being set for the ::redis class for all
but one of the controllers).

I have a couple thoughts on this, but it depend on your answer to
the above question on how master is elected.

Cool, those would be great to hear.

Oh,and one general request on the commit message. Would you kindly updated it on the next revision to be of the form:

BZ #1180158: Ceilometer workload partitioning with tooz & redis

https://bugzilla.redhat.com/show_bug.cgi?id=1180158

Sure thing, will do!

Cheers,
Eoghan

eglynn · 2015-01-09T20:57:21Z

@jguiditta: just thinking about a possible approach to driving slightly different behaviours on each of multiple controller nodes (in the context of setting the $slaveof param on all but one of the instantiations of the ::redis class, as discussed in my previous comment).

How about comparing the current $ipaddress_eth0 fact against the first element of the map_params("lb_backend_server_addrs") array?

If lb_backend_server_addrs is ordered the same way on each controller node, I'm thinking this comparison should only be equal on a single controller node ... am I on the right track there?

jguiditta · 2015-01-09T21:09:21Z

On 09/01/15 09:04 -0800, Eoghan Glynn wrote:

We have to explicitly tell the ::redis puppet class whether the service
run on the current host is going to be the initial master (which is the
default) or be a slave (by setting the $slave_of parameter to the
::redis class).

If that initial master fails, the redis-sentinel then decides which
slave is promoted to be the new master.

If the original master fails, a slave is promoted in its place ... but
then if the old master is restarted, the sentinels tell it revert to
being a slave (regardless of its master-like redis.conf).

So it's somewhat less dynamic than the mongodb replicaset scenario, in
the sense that the initial master is set in config, but things become
more dynamic thereafter.

It may seem odd that mastership and slavedom are initially statically
defined in config, but then dynamically managed thereafter by the
sentinels ... IIUC this is due to the master->slave replication being
developed first, and then the sentinel mechanism added later, layered
over the pre-existing replication feature.

So, I could be wrong, but this strikes me as similar to galera setup,
so maybe something similar could be done here. I would point you at
both the pacemaker/galera manifest and the load_balancer/galera
manifest to see what you think. Perhaps we could do the same thing
where we use the stick table option there?

The other approach I was thinking could work would be to use the
cluster_control_ip to set the initial master, and pass that in instead
of the array[0] that is there now. You could check in the manifest using
a stdlib function 'do I have this IP address (cluster_control) - this
might itself need to be a wrapper function, or call one, I don't think we
have exactly that right now, though it would be fairly easy to write (under
lib/). That would take care of the initial setup for each node. The
failover case, I am less sure of, since it seems like without haproxy
or similar, the ceilometer configs would be pointing at the wrong
(dead, old master) server. Is there some magic correction here I am
missing? This seems exactly the case for pacemaker, haproxy and VIPs,
but maybe I am missing something here.

It seems like Node 1 dies, 2/3 decide that 2 is the new master. OK,
reasonable enough, but do they change their own config files at this
point? If not, do they disregard the master/slave conf settings after
initial bootstrap? This may be the better scenario, assuming there is
some way we can query redis to say 'which of you is now master?'. If
we can do that, then we could create a fact to update the ceilometer
setting that points to the (now dead) node one as the redis server.
This would restart the service (ceilo) if needed to pick up the
change. However, there can be significant lag between when node 1
fails and node 2 takes over, and the next time puppet were to run and
pick up the change in state. This is usually where pacemaker comes in

perhaps with a custom resource like galera has (though hopefully
less complex).
```
However, what I'm missing is how, in quickstack terms, to
approach ensuring a different action is taken on one controller
as opposed to another.

i.e. can you point me to an existing pattern where different
parameters are passed to a particular puppet class on a subset of
the controllers?

(in this case $slaveof being set for the ::redis class for all
but one of the controllers).
```
So, for initial setup, this would be the pattern I described above -
set the ip you are passing in to whichever of lb_backend_server_addrs
exists on the same machine as cluster_control_ip. If the machine
running this manifest doesn't have the control ip, then set it as a
slave (again, this would require some little helper function,
leveraging the stdlib 'has_interface_with'). Perhaps this would be
something like
get_matching_ip($lb_backend_server_addrs, $cluster_control_ip)

And that could return the ip or nil/blank string? I think something
along these lines would work. You can certainly override the variable
at the per host level in foreman, but I think that relies to much on
foreman being there. We are trying to keep things so foreman is not
am absolute requirement for quickstack to work, even though we have
clearly built it for that context first. Staypuft doesn't support
per host overrides in the wizard though, I don't think, so if you want
this to work there, that would not be an option I guess.

Hope that is at least somewhat helpful.

-j

eglynn · 2015-01-11T19:27:55Z

Hi Jason,

Thanks again for talking the time to consider this carefully, this discussion is very helpful :)

TBH I think I've missed the point about the relevance of the galera stick-table config, as I thought that simply controls the size etc. of the data structure used by HAProxy to implement stickiness.

However I'm liking the other approach you've suggested around checking for an address match, having expressed similar (though less developed) thoughts in my previous comment. In fact it seems IIUC that we use already a similar approach in the quickstack::pacemaker::rabbitmq class in order to ensure the first node in the rabbitmq cluser starts up first[1].

So I'm wondering if that might be sufficient in this case also, i.e. whether we'd even need to worry about matching the cluster_control_ip to a particular lb_backend_server_addr?

i.e. would simply designating one of those lb_backend_server_addrs as special (say the first or the last) suffice to drive the different behaviour on a single controller node hat we need? ... along the lines of the following pattern:

  $_nodes = map_params('lb_backend_server_addrs')
  $initial_master = $_nodes[0]
  if has_interface_with("ipaddress", $initial_master) {
      $_slaveof = undef
  } else {
      $_slaveof = $initial_master
  }

  class { '::quickstack::pacemaker::redis':
    bind_host => map_params("local_bind_addr"),
    port      => $coordination_backend_port,
    $slaveof  => $_slaveof
  }

BTW you're correct in your thinking the mastership assignment in redis config is effectively disregarded when a previously-failed master is restarted, in this case the sentinels handle telling the old master that it's no longer top-dog.

You're also correct in observing that the complete death of the the node hosting the "contact" sentinel would be problematic. Unfortunately tooz does not currently allow us to identify multiple sentinels in the backend URL, so we'll need to either quickly add that in to tooz or put the sentinels all behind a VIP (assuming that's feasible?)

Cheers,
Eoghan

[1] https://github.com/redhat-openstack/astapor/blob/master/puppet/modules/quickstack/manifests/pacemaker/rabbitmq.pp#L110

eglynn · 2015-01-12T13:28:56Z

@jguiditta: I inadvertently caused this pull-request to be closed, and couldn't re-open it. So I've created a fresh pull-request[1] with the latest version of the patch, illustrating the discussion in the previous comments about setting the $slaveof status of the redis on all but one of the controllers.

Can we continue the discussion on the new pull-request?

[1] #449

jguiditta · 2015-01-12T13:33:50Z

On 12/01/15 05:28 -0800, Eoghan Glynn wrote:

[1]@jguiditta: I inadvertently caused this pull-request to be closed,
and couldn't re-open it. So I've created a fresh pull-request[1] with
the latest version of the patch, illustrating the discussion in the
previous comments about setting the $slaveof status of the redis on all
but one of the controllers.

Can we continue the discussion on the new pull-request?

Sure thing

jguiditta reviewed Jan 7, 2015
View reviewed changes

eglynn closed this Jan 12, 2015

eglynn force-pushed the master branch from f3084d7 to 47f71af Compare January 12, 2015 13:07

eglynn mentioned this pull request Jan 14, 2015

BZ #1180158: Ceilometer workload partitioning with tooz & redis #449

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ceilometer workload partitioning with tooz & redis #447

Ceilometer workload partitioning with tooz & redis #447

eglynn commented Jan 7, 2015

jguiditta Jan 7, 2015

jguiditta commented Jan 7, 2015

eglynn commented Jan 8, 2015

jguiditta commented Jan 9, 2015

jguiditta commented Jan 9, 2015

eglynn commented Jan 9, 2015

eglynn commented Jan 9, 2015

jguiditta commented Jan 9, 2015

eglynn commented Jan 11, 2015

eglynn commented Jan 12, 2015

jguiditta commented Jan 12, 2015

		@@ -0,0 +1,14 @@
		class quickstack::pacemaker::redis(
		$bind_host = '127.0.0.1',

Ceilometer workload partitioning with tooz & redis #447

Ceilometer workload partitioning with tooz & redis #447

Conversation

eglynn commented Jan 7, 2015

jguiditta Jan 7, 2015

Choose a reason for hiding this comment

jguiditta commented Jan 7, 2015

eglynn commented Jan 8, 2015

jguiditta commented Jan 9, 2015

jguiditta commented Jan 9, 2015

eglynn commented Jan 9, 2015

eglynn commented Jan 9, 2015

jguiditta commented Jan 9, 2015

eglynn commented Jan 11, 2015

eglynn commented Jan 12, 2015

jguiditta commented Jan 12, 2015