Charm Developer: Need best practice page on reactive patterns and anti-patterns #1263

Open
arosales opened this Issue Jul 20, 2016 · 18 comments

Comments

Projects
None yet
9 participants
Contributor

arosales commented Jul 20, 2016

When creating a reactive layered charm a new charmer should know of best practices and common pitfalls in creating states so they can avoid races and deadlocks when possible for community learning.

This page would be to start documenting some best practices and patterns, and also things to avoid. The page may start small but should be added to as we learn better patterns.

This page should also be linked to in the getting started Charm Developer docs.

Race conditions are the big one we see in CI runs. For avoiding them, here are some of my initial thoughts based on my experience with debugging some of our charms:

  • Remember that you are building a finite state machine. You have a list of possible events that come in from hooks, and quasi-events from @when_not decorators, and you need to think about all the possible states your charm could be in when each event arrives. Many of them won't matter, but a lot of them will.
  • You should draw out a diagram of the states and the possible transitions between states. Draw it for yourself, if you are a visual learner, and if you're not, draw it anyway, for the others who will come along and try to understand your charm.
  • Name your states as actual states in your FSM, not as actions that need to occur. e.g. Don't have this:
@hook('install')
def install_hook():
    set_state('myapp.install')

@when('myapp.install')
def do_myapp_install():
    ...
    set_state('myapp.configure')

@when('myapp.configure')
def do_myapp_configure():
   ...
   set_state('myapp.start')

The problem with the above approach is that you're triggering pseudo-events, but you don't know what other events might occur in the meantime. Instead, make assertions about what state your method leaves the charm in, and write handlers based on those states, e.g.:

@when_not('myapp.installed')
def install_myapp():
    ...
    set_state('myapp.installed')

@when('myapp.installed')
@when_not('myapp.configured')
def configure_myapp():
    """
    Handles both initial configuration, and reconfiguration from
    config-changed hook
    """
    ...
    remove_state('myapp.started')
    set_state('myapp.configured')

@when('myapp.configured')
@when_not('myapp.started')
def start_myapp():
    """
    Also restarts myapp if it's already running
    """
    ....
    set_state('myapp.started')

@hook('config-changed')
def juju_config_changed():
    """
    force myapp to be reconfigured;  you might want to be more
    selective about this, if there are certain config values which
    shouldn't force a restart
    """
    ...
    remove_state('myapp.configured')

Caveat: I'm not sure how well the above interacts with traditional juju hooks like install, start, stop, etc. It seems to work OK for us in practice, but there could be some gotchas that need highlighting.

  • Generally, don't use @only_once unless it absolutely doesn't matter when it runs. (Hint: it probably does matter.)
Owner

marcoceppi commented Jul 21, 2016

Best practice is to never use @hook at all, we're planning on phasing that
out or making it less prominent in future releases because it's so
problematic.

On Wed, Jul 20, 2016 at 8:18 PM Paul Gear notifications@github.com wrote:

Race conditions are the big one we see in CI runs. For avoiding them, here
are some of my initial thoughts based on my experience with debugging some
of our charms:

Remember that you are building a finite state machine
https://en.wikipedia.org/wiki/Finite_state_machine. You have a list
of possible events that come in from hooks, and quasi-events from
@when_not decorators, and you need to think about all the possible
states your charm could be in when each event arrives. Many of them won't
matter, but a lot of them will.

You should draw out a diagram of the states and the possible
transitions between states. Draw it for yourself, if you are a visual
learner, and if you're not, draw it anyway, for the others who will come
along and try to understand your charm.

Name your states as actual states in your FSM, not as actions that
need to occur. e.g. Don't have this:

@hook('install')def install_hook():
set_state('myapp.install')
@When('myapp.install')def do_myapp_install():
...
set_state('myapp.configure')
@When('myapp.configure')def do_myapp_configure():
...
set_state('myapp.start')

The problem with the above approach is that you're triggering
pseudo-events, but you don't know what other events might occur in the
meantime. Instead, make assertions about what state your method leaves the
charm in, and write handlers based on those states, e.g.:

@when_not('myapp.installed')def install_myapp():
...
set_state('myapp.installed')
@When('myapp.installed')@when_not('myapp.configured')def configure_myapp():
""" Handles both initial configuration, and reconfiguration from config-changed hook """
...
remove_state('myapp.started')
set_state('myapp.configured')
@When('myapp.configured')@when_not('myapp.started')def start_myapp():
""" Also restarts myapp if it's already running """
....
set_state('myapp.started')
@hook('config-changed')def juju_config_changed():
""" force myapp to be reconfigured; you might want to be more selective about this, if there are certain config values which shouldn't force a restart """
...
remove_state('myapp.configured')

Caveat: I'm not sure how well the above interacts with traditional juju
hooks like install, start, stop, etc. It seems to work OK for us in
practice, but there could be some gotchas that need highlighting.

  • Generally, don't use @only_once unless it absolutely doesn't matter
    when it runs. (Hint: it probably does matter.)


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1263 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAET1VCAaYe2B6Ul93HFUEShq1YG9bDYks5qXrqqgaJpZM4JRSrw
.

paulgear commented Jul 21, 2016

@marcoceppi With most hooks I think that would be fine; stop might need to stay to make destroy-unit/destroy-service work smoothly; ditto upgrade-charm so we can do the right thing there.

But what is your plan with config-changed? I have found that the config.changed.NAME states don't behave reliably in practice - mostly showing as changed when they haven't, but IIRC the other way around happens sometimes too. I dug into layer-basic for a little while to try to work out why, but didn't find it by the end of the time I had allocated. :-\

@evilnick evilnick added the eco-team label Jul 21, 2016

hloeung commented Jul 22, 2016

@marcoceppi , can you explain more as to why you think we should avoid @hook's?

Contributor

mbruzek commented Jul 22, 2016

@hloeung I did not believe this statement when I heard it, but it has been my experience that reactive code can be written to never need the hook decorators.

The state engine is evaluated every time an event runs in Juju. To put this another way the state engine is checked for matching hooks when ANY hook (or event) is evaluated like update-status which is run every 5 seconds. So it is possible to use the other decorators to construct a condition (set of states) that obviates the need for @hook.

For traditional charms to have mutable configuration, we found most of code (including installation) had to be in the config-changed hook because that was the only way to get the code to run when configuration values changed. With reactive you can use states to run any method at any time, and no longer need to be waiting for a config-changed event. If some configuration option changes that would force a reinstall the reactive install method could have that as a when specific configuration changes (that would necessitate a reinstall) then do it.

The replacement for a config-changed hook is to have a method decorated with states that could be true at any time such as:

@when('myapp.configured)
@when_any('config.changed.option1`, `config.changed.option2`)
def reconfigure():
    """The configuration has changed, write these values to the file and remove the started state."""
    ...
    remove-state('myapp.started')

@paulgear if you can give us some examples of when config.changed.NAME do not behave correctly we could fix thoses bugs. Please file issues here: https://github.com/juju/charm-tools/issues

Contributor

mbruzek commented Jul 22, 2016

@paulgear wrote some great examples of reactive here. This reminded me of an email I sent to a partner 3 months ago about best practices for reactive who write all their charms in bash:


Here are some of the tips and trick I have learned when writing reactive charms. I have attached a simple bash example that helps illustrate the points in this email.

#!/bin/bash
source charms.reactive.sh
set -e

@when_not 'db2.installed'
function install_db2() {
  local package_name=$(config-get 'db2_package_name')
  local arch=$(uname -m)
  # install DB2 here
  set_state 'db2.installed'
}

@when 'db2.installed'
@when_not 'db2.configured'
function setup_db2() {
  # Configure DB2 here.
  set_state 'db2.configured'
}

@when 'db2.configured'
function configuration-changed() {
  local package_name=$(config-get 'db2_package_name')
  local current_package=$(/usr/bin/db2 --version)
  # If the package changed remove the db2.installed state, that will cause a reinstall.
}

reactive_handler_main

@hook decorators:

You almost never need @hook if you structure the reactive states correctly. I always have an "entry point" or a function that is called when no states are set (the install_db2 function in this example), and the function always sets the "[name].installed" state when completing successfully. This function could be equivalent to the install hook, but is much better because removing the "[name].installed" state will cause the software to install again on the next reactive event cycle, where the install hook is only called on deploy and upgrade.

Immutability:

I normally have an "exit point" that is a state when everything completes successfully. The reactive framework evaluates all the states that match the @when and @when_not decorators each time a states changes, or any hook is run. So my "exit state" function acts as a config-changed hook because the code will be run every time the reactive framework evaluates states. This helps prevent immutable configuration in the charms. When configuration does change, I often just remove states and that makes the install or configure methods run again, depending on the situation.

You can of course make as many states and functions as makes sense for your code, but these are the basic ones that I start with on each reactive layer.

@mbruzek, I'll see if I can dig up my conversation with @chuckbutler in Freenode #juju about this; we messed around for quite a while without coming up with a good solution other than to handle it manually in the config-changed hook.

One other thing I think needs to go into the docs: because every reactive state has the possibility to fire on every juju hook, be super-careful with anything that might run every time: such handlers must be efficient and non-side-effect-inducing, because they will be invoked every 5 minutes with the update-status hook.

jacekn commented Jul 27, 2016

@paulgear

@hook('config-changed') is not really necessary. The same behavior can be achieved using something like this:

@when('myapp.started')
def check_config():
    if data_changed('mycharm.config', hookenv.config()):
        # At least one config option has changed
        # force service reconfiguration
        remove_state('myapp.configured')

Above will run every time config-changed is triggered but also give you great flexibility. You could, for example, check which config options have changed and only do partial reconfiguration.

Above code also avoids "config.changed" states which are IMO super confusing. Config updates are events not states

Owner

marcoceppi commented Jul 27, 2016

@jacekn that's one way to tackle it, another is:

@when('myapp.started')
@when('config.changed')  # Or 'config.changed.key'
def configuration_changed():
    remove_state('myapp.configured')

Why is config.changed super confusing? Is it becuase it's an implicit "event" modeled as a state?

https://github.com/juju-solutions/layer-basic#reactive-states

jacekn commented Jul 27, 2016

@marcoceppi

Yes it's exactly that. Events modelled as states that are magically removed after each hook run (like "config.changed" and friends are) can lead to confusion and are non-obvious to me.

It can also make people believe that this is how states are supposed to be used - as "signals" rather than persistent states.

Owner

marcoceppi commented Jul 27, 2016

We, and the community agree with this. Your alternative approach is a good way to handle that now, there are other discussions about how to model "events" instead of having it as a state that we're targeting for 2.0

stub42 commented Jul 28, 2016

@jacekn - your example only works if you have a single handler doing the data_changed check with that key, so if should probably be mycharm.mymodule.check_config.config to ensure unique keys. It also triggers if the data was changed before myapp.started was set, such as in a previous hook (which could be a useful technique in some cases, but not here were you are going to reconfigure a service straight after starting it up for the first time, presumably after you just configured it).

It becomes less confusing when you accept that config.changed is actually a state. Events are in the past. The config.changed state is current and remains set for the duration of an entire hook. It is a state set by a different layer, and that layer clearly defines when it is activated and when it is deactivated. There are many states set by different layers which are activated and deactivated at times you don't control - why are these different?

Perhaps I started this by choosing my tenses unwisely - 'leadership.changed' (past tense, indicates an event in the past) rather than 'leadership.changing' (present continuous, indicates a current state)
or 'leadership.change' (present, event happening now, or imperative, an order you can respond to).

But on the other hand, 'changed' reads better and matches other states like 'started', 'stopped', 'destroyed', 'configured' which are all both events and states. 'rebooted' is interesting, and is similar to 'changed' but I suspect people would have much less problem with it.

If naming would improve the situation, leadership and basic layers can easily change (heh), continuing to support the old naming indefinitely.

This issue is of course about documentation that needs to exist now rather than for a future version of charms.reactive, and in my opionion the config.changed state is currently superior if explained well (and we want this documentation because we need things like this explained well ;) )

tbh. I suspect adding events is going to make things more confusing, unless it is simply renaming 'states' to 'signals' or 'flags' or some other terminology. And event is something that happened in the past. Do we limit it to mean 'sometime between the start of the hook and now', or can the event have happened in a previous hook but only now triggering a handler due to other constraints on the handler? eg. a file-changed event triggered in an install hook that can't be handled until

jacekn commented Jul 28, 2016

@jacekn - your example only works if you have a single handler doing the data_changed check with that key, so if should probably be mycharm.mymodule.check_config.config to ensure unique keys.

True. In my charms I always have single place where to I check for config changes so I never had to deal with this problem

It also triggers if the data was changed before myapp.started was set, such as in a previous hook (which could be a useful technique in some cases, but not here were you are going to reconfigure a service straight after starting it up for the first time, presumably after you just configured it).

Why would it be triggered before "myapp.started" is set? My expectation is that @When('myapp.started') prevents that from happening.

One small downside to my approach is that first time check_config is called it will always detect changes because of the way data_changed works. I don't consider it to be a big deal because hooks should be idempotent.

It becomes less confusing when you accept that config.changed is actually a state. Events are in the past. The config.changed state is current remains set for the duration of an entire hook. It is a state set by a different layer, and that layer clearly defines when it is activated and when it is deactivated. There are many states set by different layers which are activated and deactivated at times you don't control - why are these different?

Thanks, this explanation makes a lot of sense, especially after I read your naming changes ideas.

In any case - I totally agree that this is mostly documentation problem and by writing good docs we can make both approaches work and easy to understand.

stub42 commented Jul 28, 2016

@jacekn It won't be triggered before myapp.started is set. What I'm saying is that hookenv.config() may have changed several hooks ago. The handler will be triggered as soon as the started state gets set and the configured state removed if the config has changed since the last time the handler ran (or if it has never run before). This is different to @marcoceppi 's example, where the handler is only triggered if a) this is the first hook that sees the config change and b) the started state is already set or gets set during the hook. Both patterns are useful, but Marco's seems better for this use case (reconfigure a running service if the config changes).

So I got rid of some bogus exception handling in a reactive charm just now, and replaced a lot of it with this:

hookenv.status_set('blocked', 'Could not do X')
do_x()
hookenv.status_set('active', 'Successfully did X')

This means that if do_x() throws an uncaught exception, it'll show up in the logs and the blocked status will get back to juju.

Also I filed a bug over in launchpad that resulted in #1288 and juju-solutions/interface-juju-info#3

stub42 commented Aug 9, 2016

@spads-spads if do_x() throws an uncaught exception, the unit will end up in an error state

spads-spads commented Aug 9, 2016

good point. I was working with someone else's code that tried to set states in bare except: clauses, so this way was a direct translation but an improvement.

@evilnick evilnick added the blocked label Jan 31, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment