Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
================ === Twitcher === ================ 1. License 2. Overview 3. Installation 4. Configuration Language 4.1 Examples 4.2 Things to Avoid 5. Known Issues 6. Future Features 7. Change Log 1. License ========== This software is licensed under the Apache 2.0 license. Please see LICENSE in the top level of this git repository for more information. 2. Overview =========== Twitcher is a minimal tool used to run scripts when a znode in ZooKeeper changes. The intent is to allow various system tasks to take place on a watched model rather than via polling. The best description of Twitcher is to highlight an example that it solves very well. Lets assume we are facing a problem where we need a single git repository to be checked out and up to date on 100 machines in a cluster. The typical solution to this is to add a cron job that performs a git pull every minute. This can be problematic as your git server has to endure the load of 100 servers performing a pull every minute. You can configure Twitcher on all machines to watch /git/head on your ZooKeeper instance. When that znode changes Twitcher can run git pull. Add a post-commit script in your main git repository that pokes that file and you will be able to fetch updates within few seconds of a commit on all machines while not requiring constant fetching when no changes are being made. 3. Installation =============== The default installation of Twitcher places the twitcher binary (bin/twitcher) in /usr/sbin and the twitcher module (twicher/*) in the site-packages directory which is often /usr/lib/python2.6/sites-packages/twitcher. The configs are read recursively from /etc/twitcher while twitcher is running. Note that Twitcher uses inofity to update when files change so it can automatically reload them. This is used to allow quick adding and removing of configs without restarting the server binary. If a config fails to parse then it will not be updated in the running binary and a log message will be written. By default Twitcher uses syslog under daemon as the default logging method. 4. Configuration Language ========================= The configuration language for Twitcher is actually just python. Each configuration must register a znode, and an action for each watch it wants. This is done via a RegisterWatch function which takes several arguments. RegisterWatch(): Creates a watch that will run a script when a Zookeeper node is updated. znode: This is the znode in ZooKeeper that is to be watched. action: This is a function or lambda that will perform an action when the znode is modified. The most common use here is to call Exec() which is a function documented later. pipe_stdin: If True then the contents of 'znode' will be piped to the stdin of the processes running action. Default is True. run_on_load: If True then the action will be executed when Twitcher starts. Without this your action may miss updates. The default is True. run_mode: This defines how Twitcher will react when 'znode' is updated while it is running 'action' for a previous update. The optional modes are: QUEUE: This will queue the watch until the currently running 'action' finishes, then run 'action' with the most recent contents. If several watches are received while 'action' is running then only the last update will be executed. DISCARD: Ignore the update, don't run the script, but re-register the watch. PARALLEL: Run the script in parallel for every update received. The default run mode is QUEUE. uid: The user id to run the process as (string or int). The default is to run as root. gid: The group id to run the process as (string or int). The default is to run as root. notify_signal: This is the signal that will be send to the spawned process that is currently running when a new update has been received. By default this is not used, and it will only work if the run_mode is QUEUE. timeout: The number of seconds before any spawned process should be sent a SIGTERM. Five seconds later the process will be sent a SIGKILL. description: This is a text description of the watch which will be used for logging. The default is to name watches after the file they are configured in. Exec(): Returns a lambda that will execute a given command when run. command: If this is a string then the command will be invoked in a shell interpreter. If its a list then it will be executed exactly as specified. By default the configuration files should be placed in /etc/twitcher and should use an extension of ".twc". 4.1. Examples ============= In this example config a node called '/twitcher/example1' will be watched for updates, and when it changes it will execute "date >> /tmp/twitcher_example1.out". RegisterWatch( znode='/twitcher/example1', action=Exec('date >> /tmp/twitcher_example1.out') ) As another example we can run a python function rather than executing a command. For this example we will execute the function example2() every time '/twitcher/example2' is updated. def example2(): open('/tmp/twitcher_example2.out', 'w').write('testing') RegisterWatch( znode='/twitcher/example2', action=example2 ) Keep in mind that the action function is run in a process that has been forked off from the main Twitcher instance. As such the following example DOES NOT WORK AS EXPECTED. The output will always be 1 since the increment happens in the child process which exits after doesnotwork() finishes. x = 0 def doesnotwork(): global x x += 1 open('/tmp/twitcher_badexample.out', 'w').write(x) RegisterWatch( znode='/twitcher/doesnotwork', action=doesnotwork ) 4.2 Things to Avoid =================== Any action run by Twitcher is expected to by idempotent. Since the script may be executed at any time and can be triggered rapidly it is expected that each script does basic health and sanity checking before performing expensive or destructive actions. Even if the script is in QUEUE mode there is a possibility that it may be run twice. If Twitcher is restarted while the script is running it will then restart the script when Twitcher restarts which leaves two running at the same time. In order to prevent this your script should use file based sigil or other method to verify exclusivity. 5. Known Issues =============== None at the moment. 6. Future Features ================== Failure handling on the child process: Allow the config to specify what should be done in the case where the script fails (returns non zero). Some examples include "retry x times" "fail but keep watch" "fail and stop watching" 7. Change Log ============= Release: 1.5 2010/11/08: Make connection re-establish all existing watches. Release: 1.4 2010/10/25: Fix client session expiration recovery. 2010/8/3: Documentation fixes. Release: 1.3 2010/8/2: Added notify_signal and timeout support and fix a few bugs. Release: 1.2, Open source under the Apache 2.0 license. 2010/7/15: Added uid/gid support Release: 1.1 2010/6/24: Fixed a bug that allowed child processes to get ignored forever. Release 1.0 2010/6/19: Initial review.