Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
538 lines (404 sloc) 17.3 KB

Deploy design

Table of Contents

Introduction

`deploy’ is a deployment tool which operates in the same domain as Chef, Puppet, Cfengine and Cfruby. I have used all. Actually, I created Cfruby with David Powers many years ago, the start was probably in 2005 and sloccount estimates 2.5 years of development. Cfruby was more of a Cfengine clone that works without a server and I have been using it to manage work environments, mail servers and web servers, all those years. Cfruby is showing its age. It has good ideas, but somehow it is (still) too complicated or too simple (depending on how you look at it). Also I want it to run faster and use parallelisation.

So, what do I want out of a deployment tool? First of all it has to copy configuration files from a git repository to defined places on a host. That, by far, is the most important idea (I thank Kent Skaar for introducing me to Cfengine around 2000) and requires a place to store files and a mapping to the destination.

Second, I like the idea of (overlapping) machine classes as introduced by Cfengine. So you can say that hostname is part of the webserver class and also runs ssh (the ssh-server class). The mapping and global configuration will be in YAML (this time, sorry David Powers for ignoring your idea).

Third, we want to be able to list properties of a machine. This means we have to have an intermediate representation. Something that Cfruby misses and GNU Guix has.

Fourth, we want fine grained output that can be configured. Cfruby went some way here, but I did not use it much because it was not good enough.

Fifth, we don’t want a client/server environment like the others have. I strongly believe in simple tools and a runner + git checkout should reflect the state of each machine. The simple runner can be run via ssh to create a client/server setup.

Sixth, the configuration has to be simple. Really simple. In Cfruby we created a Cfengine-type language. The problem with that is that you have to understand that DSL. I think the idea of combining YAML with Ruby is the way forward. Guile S-EXP would be even better (perhaps), but we can add that as a front-end later. The intermediate representation should help there.

Seventh, configuration can happen in parallel to speed things up, but I said that already.

Eight, transaction support. Every ‘module’ should complete completely, or not. The machine state should be in before or after. Not in between. Ah yes, the concept of modules and file locking is there too.

`deploy’ will be part of the GNU Guix setup. So install Guix and you should be set to run this system. With a simple:

deploy git://mygitserver/myenvironment.git

Will anyone use this? I don’t know. The point is that it saves me a lot of time to have this type of functionality on the systems I manage. Sometimes (indeed) it is easier to write your own software.

Design of `deploy’

A configuration `language’

Configuration will be a compination of YAML and simple Ruby and/or Guile. Depending on needs and preferences. You can write deploy definitions with YAML and Ruby, YAML and Guile, or all three. Scripts can be written in any language. All `deploy’ does is provide the framework.

The basic framework is written in Ruby, the reason being that we have so much useful code already in Cfruby. In time, some of it may migrate to Guile. Personally I am not too concerned about mixing languages - thanks to GNU Guix there is no deployment issue there.

Intermediate representation

An intermediate representation of actions, a `bag’, allows us to create views on that data and checking of SHA values (i.e. changes to files). Also, for rolling back and deciding on parallel execution, the intermediate representation comes in handy.

The intermediate form is, in effect, the full-blown representation of the data tree that is written in YAML. So, if you copy a directory, the intermediate representation (a bag in Guix speak) contains all the files that are listed for copying, the modes, ownership and destination. Note it is an expansion only - the files in the bag are not compared for being different etc. That is a separate step.

The idea is to separate expansion from checking and execution. Also the expanded list can be shown and studied, if required, without checking and execution, or with checking only.

Transactions

Transactions are surprisingly hard to achieve. The idea is to have either the start or end state, but not what happens in between. Think of what happens during an install when electricity fails.

GNU Guix (and its sibling Nix) have the right idea: Create a new directory for files and symlink to that in the final step. Nothing is visible until the symlink changes. Rolling back is simply replacing a symlink (btw. the much older GNU Stow does something similar with symlinks).

We will use GNU Guix for some stuff. So that is tactic one. Inside the reposititory there may be a ./guix directory containing definitions.

Tactic two is to use the same symlink idea, but to implement it ourselves.

Tactic three is to use git and roll-back on git.

Tactic four is to capture error conditions and roll back.

Just as an example, I use stunnel which has a configuration file in etc/stunnel. With GNU Guix that file would be encoded and hosted inside the stunnel store path. Rather clean!

If we were to use a symlink, we’ll symlink from /etc/deploy/stunnel/stunnel-1 to /etc/stunnel, that would work too. Arguably the directory should be /var/deploy/stunnel, but I think in this case I prefer to see what is happening in /etc.

With git it is harder. The information about backtracking would be contained in etc.git (which I also use) and manually restored.

With the final option, capturing error conditions, we can maintain previous state (somewhere) and write back the originals if the transaction does not complete. Main problem is for disruptions to a running install, i.e., what happens in case of electricity failure? Maybe we should skip this option.

Another point of consideration is checking for the final result. Actions in modules have to be ordered in such a way that the final result is the same (say when one module copies a clean file and a second module edits the file the first should not overwrite the second).

Ordering

In other words modules need to be ordered and their commands too. The heuristic is to put mkdir first, followed by file copy, followed by file-edits. Modules working on the same paths should be in the same transaction. More than one mkdir is not allowed, nor more than one file copy. With file edits the actions are sorted by module name (unless an explicit dependency is defined) followed by occurance in the command file.

Modules

A module is an independent entity. For example configuring vim is independent of configuring emacs. A module is also a transaction.

We should be able to specify what modules to run or skip. For example a directory cleaning module could normally be skipped.

Parallelism

Independent modules can be executed in parallel because modules are transactions.

Interference and locking

Modules interfere when they need to edit the same file, e.g., firewall rules or hosts.allow. One of the mistakes we made in Cfruby is that when two ‘modules’ changed a file there could be a conflict. Cfruby had no locking in place and that would lead to trouble.

Another aspect of interference is between two different deployment runs. If we are to track different runs a database should be in place. At this point I think we should opt for a NoSQL database in var/deploy or $HOME.deploy/ (when run as a normal user) which allows for locking between different runs and roll-backs too.

Fine grained output

Output should be fine grained and easy to control. Too much output and people stop reading them. Output should be written to disk on completion of a module, collected at the end of a run and sorted by priority for display. All messages can be filtered and set by individual module.

Error levels should be:

Fatal error

Any error that breaks the install.

Error: Non-fatal errors

Normally halt module on Error, but it can be overridden by a command line switch. Transactions is rolled back. Other modules continue.

Warning

Just a warning.

Debug

Debug information.

Info

General info. Within info we can have multiple levels, each one of which should be selectable or any combination thereof.

  1. bag: show general points of entry (e.g. copy-file).
  2. compare: show decision steps taken, e.g. compare SHA value
  3. action: actual actions taken, e.g. copying-file
  4. skip: show unchanged - i.e. actions skipped

Install GNU Guix, ruby, git and deploy

Note: GNU Guix is optional for running `deploy’. But if you want sane and reproducible system installation, GNU Guix is the way to go. Also, we will (sometimes) use GNU Guix for deployment itself. GNU Guix shines where it comes to transactions, for example.

Follow the tar installation instruction on the Guix website. Basically, download the tarball, unpack it, copy the relevant dirs to /gnu and /var/gnu, add the relevant groups and users, and start the Guix daemon.

Once Guix is running update Guix

guix pull

and now we need guix, guile, ruby and git to run deploy

guix package -i guix guile ruby git 

and set the path:

 export PATH="/root/.guix-profile/bin:/root/.guix-profile/sbin"
 export GEM_PATH="/root/.guix-profile/lib/ruby/gems/2.2.0"

Note that if you deploy these tools to multiple freshly installed servers it may be worth using the guix archive functions to speed things up, or even create your own tarball of guix (make sure to include the database in /var).

To install deploy (for now) we checkout the git repo itself.

git clone git://github.com/pjotrp/deploy.git

And you should be able to run

./deploy/bin/deploy

Setting up a deployment definition

The first step is to set up a git repository to store the definition. Here we are going to set up two examples, one for a server installation and one for a HOME directory. Unsurprisingly I use both.

Classes

First the server. In the fresh git repo we add a YAML file named ‘classes.yaml’ that defines the host and the classes it belongs to. E.g.

ssh: any
guix: any
webserver: myhost01 myhost02
firewall: webserver

Where myhost01 is a hostname or group of hosts, a class in itself. If you run deploy on myhost01 it will recognise the host belongs to classes webserver, firewall, guix and ssh (a simple expansion).

Classes (effectively groupings) are important for registring functionality, but also for defining physical networks (DNS access) and giving different access to machines (hosts.allow).

With Cfruby and cfengine classes were defined differently, but I like this approach because it clearly lists what a machine should be doing. Note: classes can be higher level abstractions and the host can also be ‘any’ so this git repository definition is relevant to all machines. A class can contain machines (webserver) and other classes (firewall).

To run this file simply point to the base directory or git repo, i.e.

deploy serverrepo

which will pick up the classes.yaml from ./serverrepo/ dir.

Modules

Modules are self contained (in principle independent) installation descriptions. A module can create dirs, install software, copy files, edit files, etc. etc. An ssh installation would be one module. A webserver would be one module. An emacs or vim configuration in the HOME directory would be one module. Modules are simply listed in directory ‘config’. The config directory is walked to find modules.

In principle modules are independent so they can run in any order. It is possible, however, to state that one module depends on another with the require descriptor. So a git webserver can depend on git.

At runtime the dependencies are ordered for execution.

Config files

The convention for config files (aka as modules) is that they reside in the repository/config/*.yaml

Master files

The convention for masterfiles is that they are relative to repository/masterfiles/module/. If that module dir is missing the masterfiles are simply relative to repository/masterfiles/.

Intermediate format

When the config files are parsed `deploy’ won’t run immediately. Instead it creates an intermediate representation, a `bag’, with all the files and options expanded. These are reordered for later processing.

Commands

Copy files

So a module for ssh could copy the sshd_config file for a certain class. The convention is to store such files in ./masterfiles/class/filename. In ./config/ssh.yaml we could define

- dir:
    /etc/ssh:
      mode: "0755"
      user: "root"
      group: "root"
- file-copy:
    sshd_conf:
      mode: "0400"

Actually the settings are defaults, so you can do

- dir: "/etc/ssh"
- file-copy:
    sshd_conf:

Note that the last dir used gets picked up as a destination, this makes for the short notation.

The Guile S-EXP version will be even more simple because we can remove the duplication. But that is for later.

Classes

Now say we don’t want to install sshd on all servers - it is just an example.

We define a class named sshd in classes.yaml containing myhost01:

sshd: myhost01

This means when running the ssh module on myhost01 we want it to install, otherwise skip. Now the ssh.yaml should be something like

- class: sshd
- dir: "/etc/ssh"
- file-copy:
    sshd_conf:

The class command basically says: honour the following commands until the next class command.

Copy multiple files

An emacs configuration in $HOME could look like

---
- file-copy:
    emacs:
      dest: .emacs
      mode: "400"
- dir:
    .emacs.d:
    .emacs.d/lisp:
- file-copy:
    emacs.d/lisp/markdown-mode.el:
- dir:
    .emacs.d/org:
- file-copy:
    emacs.d/org/ox-rss.el:
    emacs.d/org/toc-org.el:
- dir:
    .emacs.d/themes:
- file-copy:
    emacs.d/themes/dark-blue-theme.el:
    emacs.d/themes/zenburn-theme.el:

but there is a simpler version. We can copy files with recursion this way

---
- copy-file:
    emacs:
      dest: .emacs
      mode: "0400"
- dir:
    .emacs.d:
      source: emacs.d
      recursive: true

which copies the directory structure in masterfiles/emacs.d to ~/.emacs.d/ as in

./masterfiles/emacs/
├── emacs
└── emacs.d
    ├── lisp
    │   └── markdown-mode.el
    ├── org
    │   ├── ox-rss.el
    │   └── toc-org.el
    └── themes
        ├── dark-blue-theme.el
        └── zenburn-theme.el

This greatly simplifies copying. The .emacs file, however, needs to be specified separately because it goes directly into $HOME.

Edit files

The most common edits are switching and/or appending configuration flags, e.g.

---
- edit-file:
    sshd_conf:
      edit-lines:
        - replace:        ^PasswordAuthentication \w+
        - with:           PasswordAuthentication no
        - append-unique:  AllowUsers user
        - replace:        AllowUsers \w+
        - with:           AllowUsers user

which are line edits replacing all occurances of password authentications and appending the allow users line if missing and edit it after. Note that we are using regular expressions for scanning.

Note, btw, that we will introduce parametrization later so `user’ can be fetched from outer scope settings.

Use cases

Configuration files

The first use case is configuring a tool that has a config file in /etc. In this case we’ll configure vpnc.

vpnc expects a file /etc/vpnc/default.conf.

We create the file and store it in a git repository named vpns/. In there we have a classes.yaml containing something like

classes:
  - vpnc
machines:
  any:
    - vpnc

So anyone running this repository will get vpnc configured.

In config/vpnc.yaml we’ll have

guix:
  - vpnc
dir:
  - "/etc/vpnc"
    - mode: "0700"
file-copy:
  - default.conf:
    - dest: "/etc/vpnc"
    - mode: "0400"

So GNU Guix installs the latest software package and default.conf gets copied from ./masterfiles/vpnc/default.conf into the destination with appropriate permissions.

A future version of `deploy’ will actually create a versioned directory in etc/deploy/vpnc/vpn-1 and symlink to that to ascertain transactions and allow for roll-backs.