workflow.py is a minimalist file based workflow engine. It runs as a background process and can automate certain tasks such as delete old files, email you when new files are created, run a script to process new files.
Configuring and Starting the workflow
- create a file
workflow.configusing the syntax below
workflow.pyin that folder
-f <path>the folder to monitor and process.
-s <seconds>the time interval between checks for new files.
-n <name>the current filename, defaults to
-x <path>lets you specify the config file to use (workflow.config)
-y <path>lets you specify the cache file to use (workflow.cache.db)
-l <logfile>lets you specify a logile (else console output)
-ddaemonizes the workflow process
-c <rulename>does not start the workflow but clears a rule (see below).
workflow.config consists of a series of rules with the following syntax
rulename: pattern [dt]: command
rulenameis the name of the rule (cannot contain spaces).
patternis a glob pattern for files to monitor.
dtis a time interval (default is 1 second). Only files modified more recently than
dtseconds will be considered.
commandis the command to execute for each file matching
patterncreated more than
dtseconds ago and not processed already. If the command ends in
&, it is executed in background, else it blocks the workflow until completion. The name of the matching file can be referred to into the command with
$0. Multiline commands can be continued with
Examples of rules
*.log files older than one day
delete_old_logs: *.log [1d]: rm $0
*.txt files older than one hour to other folder
move_old_txt: *.txt [1h]: mv $0 otherfolder/$0
Email me when a new
*.doc file is created
email_me_on_new_doc: *.doc: mail -s 'new file: $0' email@example.com < /dev/null
*.dat files using a Python script
process_dat: *.dat: python process.py $0
Crate a finite state machine for each
rule1: *.src [1s]: echo > $0.state.1 rule2: *.state.1 [1s]: mv $0 `expr "$0" : '\(.*\).1'`.2 rule3: *.state.2 [1s]: mv $0 `expr "$0" : '\(.*\).2'`.3 rule4: *.state.3 [1s]: rm $0
When a file matches a pattern, a new process is created to execute the corresponding command. The pid of the process is saved in
<filename>.<rulename>.pid. This file is deleted when the process is completed. If the process fails the output log and error is saved in
<filename>.<rulename>.err. If the process does not fail the output is stored in
If a file has already been processed according to a ceratin rule, this info is stored in a file
workflow.cache.db and it is not processed again unless:
- the mtime of the file changes (for example you edit or touch the file)
- the rule is cleaned up.
You can cleanup a rule with
python workflow.py -c rulename
Or you can delete the
workflow.cache.db file. In this latter case all rules will run again when you restart
If the main
workflow.py process is killed or crashes while some commands are being executed, they also are killed. You can find which files and rules where being processed by looking for
<filename>.<rulename>.pid files. If you restart
workflow.py those pid files are deleted.
If a rule results in an error and a
<filename>.<rulename>.err is created, the file is not processed again according to the rule, unless the error file is deleted.
If a file is edited or touched and the rule runs again, the
<filename>.<rulename>.out will be overwritten.
Unless otherwise specified each file is processed 1s after it is last modified. It is possible that a different process is still writing the file but it is pausing more than 1s between writes (for example the file is being downloaded via a slow connection). In this case it is best to download the file with a different name than the name used for the patterm and rename the file to its proper name after the write of the file is completed. This must be handled outside of workflow. Workflow has no way of knowing when a file is completed or not.
workflow.config file is edited or changed, it is realoaded without the need to re-start