Skip to content
Rob Nagler edited this page Apr 2, 2019 · 7 revisions

RSConf

RSConf is a policy-rich configuration management system with a very specific set of goals:

  • High-level opinions baked in (CentOS 7, docker started via systemd, etc.)
  • Fail-fast, atomically with debuggable context
  • Configure in as few YAML files as feasible or as many as you like
  • Program in programming languages (not YAML or Jinja)
  • Build everything on the master first and download with Curl/Bash
  • The master holds a copy of all configuration for the clients
  • Manage secrets automatically and persistently
  • Default mode is pull from client; master push via ssh (if desired, but not required)
  • Just a configurator, not a remote execution engine (that's what ssh is for)
  • Single master serving dev/alpha/beta/prod channels (stages)
  • Updates to files, container images, etc. cause server restarts in proper order
  • Zero-config: client only needs bash, curl, and credentials
  • Explicit coupling between dependencies (components import and call each other)
  • Server requirements minimal: nginx and python
  • Can operate serverless so master can be bootstrapped using curl file://

These goals are different from Ansible, Salt, and other configuration management systems. We discuss our issues with Ansible and Salt in our DevOps Wiki.

Basic Operation

The master uses rsconf build to create a treat of host-specific files. Each host has its private tree, which mirrors its file system. For example, /etc/motd is accessible via /host/hostname/etc/motd. The clients only have access to their own trees, and symlinks are used to bring in certain files.

In normal operation, clients run the command rsc as root. rsc is a simple wrapper for a curl installer, which consists of two bash libraries: install.sh and rsconf.sh. Once these libraries load, they run a function that downloads /host/hostname/000.sh, which is the root configuration script on the master.

Configuration

Configuration is found in two files /srv/rsconf/db/000.yml and /srv/rsconf/db/secret/000.yml. The latter is used only for secrets although either file can contain anything. You can have more files if you like, too, just name them 0*.yml so they get processed in order.

The YAML configuration files are structured into three sections in increasing precedence: default, channel, and host. A channel is one of: dev, alpha, beta, and prod. A host resides in only one channel

A simple configuration looks like:

default: {}

channel:
  dev:
    var1: 1
    var2: 2

host:
  dev:
    v3.radia.run:
      var1: 3

The host v3.radia.run resides in the dev channel. The value of var1 will be 3, because it is overriden in the host section. var2 will be aggregated from the channel section. There are no defaults in this example.

Channels

A channel is a way of promoting software delivery through test systems. The first test stage is alpha. If a software package passes the alpha tests, it is promoted to beta. And, then on to prod. dev is used only on development machines so packages get rebuilt before they are labeled as alpha.

The advantage of channels is that you get binary compatibility between various testing stages. To promote a package, a symbolic link is updated. It's really that simple.

The dev channel is special, because packages are rebuilt on alpha, not promoted to alpha. The dev channel allows developers to operate within rsconf's eco-system.

Build on Master

rsconf build generates an entire file tree (/srv/rsconf/srv/host) with all hosts. A build is atomic: either it works or it fails, and nothing is updated. This is probably the major design design differences between RSConf and other configuration management systems.

When rsconf build runs, it is debuggable. You don't have to look on the server and the client. You just have to look on the server. This avoids security issues related to communicating what went wrong to the person running the program. With Salt, for example, when a client asks for configuration, some of the generation happens on the server and other parts happen on the client. This means you have to look into and to synchronize two outputs: the server log and the client log. RSConf decouples generation from execution.

Furthermore, every attempt is made to avoid running complex software on the client. RSConf drops in files and restarts servers. That's what it is designed to do. Sometimes it has to run package managers or special configurators (e.g. sysctl), but for the most part, all the client does is download configuration and restart servers.

Python on Master, Bash on Client, Declarations in YAML

RSConf limits the programming to Python on the master and Bash on the client. The configuration is declared (not programmed) in a list of YAML. Wherever possible, "options" are avoided in the YAML, that is, if there's a policy decision, it's made in the Python. The client executes the policy decisions made on the master in Python.

Many configuration management systems require you to program in YAML. This can lead to some very cumbersome code that crosses too many language boundaries and requires programmers to "think in YAML" instead of a more convencient programming language like Python.

Components

A component usually corresponds to a service. For example, postfix, nginx, and network are all components. This mirrors what goes on with a modern Linux system where system startup is organized into a collection of Systemd units or init scripts.

A component can require other components in which case those components are installed and started before the requiring component. The postfix component requires postgrey and spamd:

self.buildt.require_component('postgrey', 'spamd')

RSConf sorts topologically and checks there are no circular dependencies. An exception is raised if a requirement loop is detected, and the build is aborted. While this is only likely to happen during development, this is an example of RSConf's fail-fast philosophy, which prevents errors being ignored as they are by default with many configuration managers such as Ansible and Salt.

Systemd

RSConf bakes in the decision that Systemd controls how services are managed. Systemd replaced System V init scripts in two major Linux distributes: CentOS 7 and Ubuntu 15. Since RSConf is designed to support CentOS 7, it made sense to bake in the support for Systemd.

This choice simplifies both the specification of the service and the code. For example, the configuration to specify the use of nginx is:

rsconf_db:
  components: [ postfix ]

The rest of the decisions about when the service is reloaded or started is left up to the component itself. The call of systemctl daemon-reload is therefore implicit and automatic unlike other system configuration managers, which require the user to manage when systemctl daemon-reload is called.

Services

Service components have a unit configuration and a run directory, which is almost always /srv/service. Strict naming is required, that is, the run directory, the service, and the systemd unit files are all named exactly the same. For the majority of services, the component is named the same, too.

A service is automatically enabled and (re)started if it is not running or if its configuration has changed. Components tell RSConf to watch particularly directories or files with the run and unit files being watched implicitly. If a file that is being watch changes, the service will be started. RSConf restarts the service immediately if the service requests it with a call to rsconf_service_restart, e.g. postgresql has this at the end of its Python module:

self.rsconf_service_restart()

This will start all services that have pending restarts. Only one restart will occur per client execution.

To prevent a service with complex dependencies from restarting until the end, a service would specify

self.rsconf_service_restart_at_end()

The postfix and nginx components are restarted at the end, because other components may change files in their watched directories, and their services are not needed during execution of rsconf.

Systemd Timers

Systemd has support for timers, which are a replacement for cron. The main advantage of timers over cron is that you can start the timer services easily in a consistent environment. While cron does have a consistent environment, you cannot easily reproduce it from the command line. With Systemd timers, it is as easy as:

systemctl start logrotate

Timers are not a separately configurable component as yet. They used as a part of various parts of the system. For example, there is a db_bkp component that runs on a timer, and executes db_bkp.sh scripts of services that install them.

Secrets

All configuration management systems have secrets. RSConf manages secrets in /srv/rsconf/db/secret, which contains YAML files like the db directory, and also files in any format that contain secrets. For example, there is an rsconf_auth file that contains the hashed credentials for all known RSConf client hosts and is installed on the rsconf master.

To help manage the database, secrets have a visibility level: global, channel, or host_. When a secret is being accessed, the caller passes the appropriate level. For example, a dovecot password database is visible at the host level so that /etc/dovecot/users is unique for each host that runs the dovecot IMAP/POP service. Note that this visibility is only relative to the build on the master, that is, it is just for name space management during rsconf build. Only the master can see this database, and each client can see only its own files that are generated on the master from the secrets database.

Sometimes there's a need to have plain text and hashed form of a secret. The plain text file for rsconf_auth is rsconf_auth.json, which contains a machine readable version of the secrets in plain text so that RSConf can create /root/.netrc files for the clients.

The need for both versions is subtle. The hashed passwords in rsconf_auth are salted individually. While rsconf_auth could be recreated on every build from rsconf_auth.json, RSConf needs to know if the file has changed (a new client host added) in order to notify the services that depend on it. Therefore, we save a copy in order to avoid churn.

The same is true for dynamically generated self-signed TLS certificates and other secrets in test and development environments. RSConf could regenerate them ever build, this would create too much churn. rsconf.component.install_secret_path supports the dynamic generation of secrets explicitly with an existence test.

How rsconf build Works

rsconf build creates the entire host tree for all hosts configured in /srv/rsconf/db. This is surprisingly fast, taking only a couple of seconds.

TODO: overview better:

  • read the db
  • rsconf_db.components

To help avoid bugs due to cross-contamination between host configurations, the build for each host involves reading all the YAML files to create a nested datastructure (rsconf.db.Host class and instance hdb) of their merged contents. Precedence is important: secrets override non-secrets, host overrides channel, etc. To aid communication between components, the hdb is modified dynamically, which is why cross-contamination is possible.

The vast majority of host files are generated from Jinja2 templates. In order to avoid further cross-contamination, echo component copies hdb to a j2_ctx (Jinja2 context) variable before populating the j2_ctx with generated values.

TODO:

mention jinja in yaml

Component Implementations

A component is a Python module in the rsconf.component package with a class called T that subclasses rsconf.component.T and must implement an internal_build method. This method is guaranteed to be called only once per host build.

Explicit coupling between dependencies