Ansible is an automation tool used to install software and configure systems. It's frequently used either standalone or in concert with other operations-focused software.
The primary advantage of Ansible is that it is "target neutral" - the system being configured can be a physical device, a virtual server, a VM image, a docker container image, etc., which reduces duplication of effort. For example, the executors used in Jenkins can be a Packer images or physical servers. Being able to share the same roles to install and configure software on both helps reduce overhead and increase uniformity.
It also can targets basically any system that has a command line interface, even unusual platforms with non-Unix CLIs such as on networking equipment or Windows.
A good place to start learning about Ansible is the Intro to Playbooks.
Additional useful references:
- Ansible Best Practices: The Essentials.
- high quality ansible playbooks.
- Jinja2 template documentation
- filters
- module return values
We're following the Linux Foundation hierarchy and naming scheme for Ansible roles - see the ansible/role/* repos on LF Gerrit, and the LF Ansible guide.
This aligns with how things work across a variety of tools and systems - see the gerrit docs on replication, specifically the [remote.NAME.remoteNameStyle]{.title-ref} section - if/when these are replicated on GitHub, that config will cause the repo to be renamed from [ansible/role/<rolename>]{.title-ref} to [ansible-role-<rolename>]{.title-ref}, which goes along with how Ansible Galaxy has traditionally named roles.
Playbooks are run from within a python virtualenv, to ensure that all the
correct versions are available. The galaxy
target will create this virtualenv
and also download dependent roles and collections from ansible
galaxy:
$ make galaxy
...
$ source venv_onfansible/bin/activate
Once you've done this, you can run the ansible-plabook
command.
Playbooks are stored in the playbooks
directory. Note that playbooks can be
organized in this way, but the *_vars directries must be relative to either
the inventory or playbook
files,
and any files
directories must be relative to the root directory or
playbooks
.
The convention for naming of playbooks is to name them
<purpose>-playbook.yml
.
Inventory sources are stored in the inventory
directory.
A typical invocation would be:
$ ansible-playbook -i inventory/<source>.ini playbooks/static-playbook.yml
-
Create the virtualenv with the Makefile, and source the activate script:
$ make venv_onfansible ... $ source venv_onfansible/bin/activate
-
Run cookiecutter with the path to the role cookiecutter template:
$ cd roles $ cookiecutter ../cookiecutters/role
Answer the questions given, especially the name which will be the name of the role, and it will create a role directory with those answers. The default answers will result in Ubuntu 16.04 and 18.04 molecule tests, using a docker image that runs the systemd init system, to allow daemons to be run in the container.
-
Initialize git and commit the files as created by cookiecutter:
$ cd $ git add . $ git commit -m "initial role"
-
Lint and test the role with
make lint
(runs static checks) andmake test
(tests the role with Molecule). This should be done before making changes, to make sure the test process works locally on your system. -
Make changes to the role, running the tests given in #3 periodically. See the
Testing
{.interpreted-text role="ref"} section below for how to run Molecule tests incrementally. -
Add comprehensive tests to the files in the
molecule/default/verify.yml
file. See thenginx
role as an example.
Use the .yml
extension for all YAML files. This is a convention used by most
Ansible roles and when autogenerating a role with various tools like Galaxy or
Molecule.
Ansible roles and playbooks should pass both
ansible-lint and
yamllint in strict mode, to verify
that they are well structured and formatted. [yamllint]{.title-ref} in
particular differs from most Ansible examples when it comes to booleans -
lowercase [true]{.title-ref} and [false]{.title-ref} should be used instead of
other "truthy" values like [yes]{.title-ref} and [no]{.title-ref}. There are
some cases when an Ansible modules will require that you use these "truthy"
values, in which case you can disable
yamllint
for just that line. ansible-lint
can also be disabled per-line or
per-task
but this should be avoided when possible.
If you need to separate a long line to pass lint, make use of the YAML >
folded block scalar syntax which replaces whitespace/newlines replaced with
single spaces (good for wrapping long lines) or |
literal block scalar syntax
which will retain newlines but replace whitespace with single spaces (good for
inserting multiple lines of text into the output). More information is
available at yaml multiline strings. The flow
scalar syntax is less obvious and easier to accidentally introduce mistakes
with, so using it isn't recommended.
While ansible-lint tends to direct you to solution that improve your roles most of the time, the 503 warning may introduce additional complexity and may be skipped.
When listing parameters within a task, put parameters each on their own line
(the YAML style). Even though there are examples of the key=value
one-line
syntax for assigning parameters, avoid using it in favor of the YAML syntax.
This makes diffs shorter and easier to inspect, and helps with linting.
Roles have to places to define variables - defaults
and vars
. The major
difference between these is how variable precedence works in
Ansible.
In general, you should only define variables that will never need to be
overridden by a user or playbook (for example platform-specific or OS-specific
variables) in the vars/<platformname>.yml
files. The defaults/main.yml
file
should contain examples variables or defaults values that work across all
platforms supported by a role.
To ensure the integrity of artifacts and other items downloaded from the internet as a part of the role, you should provide checksums and keys as a part of the role. Some examples of this are:
-
Using the
checksum
field on get_url and similar modules. This will also save time, as during subsequent runs if the checksum matches an already-downloaded file, the download won't be required. -
For package signing keys and GPG keys, put them as files within the role and use a file lookup when using the apt_key and similar modules.
apt_key
requires an "ASCII Armored" GPG key to be used with it - if upstream provides a binary version, convert it withgpg --enarmor file.gpg
and which creates afile.gpg.asc
version.
When optionally executing a task using when
, it's easier to follow if you
put the when
condition right after the name of the task, not at the end of
the action as is shown in many examples:
- name: Run command only on Debian (and Ubuntu)
when: ansible_os_family == "Debian"
command:
cmd: echo "Only run on Debian"
The with_items
and other with_*
iterators should be put at the end of the
task.
Handlers should be named <action>-<subject>
for consistency - examples:
restart-nginx
or start-postgres
.
If you are iterating on lists that contains password or other secure data that
should not be leaked into the output, set no_log: true
so the items being
iterated on are not printed.
All templated files should contain a commented line with {{ ansible_managed }}
, to indicate that the file is managed by ansbile, when it was created, and
by what user.
Avoid using tags
, as these are generally used to change the behavior
of a role or playbook in an arbitrary way - instead use information
derived from setup to control optional actions, or use different roles
to separate which tasks are run. Use of tags other than the
skip_ansible_lint
tag will cause the lint to fail. See also Ansible:
Tags are a code
smell
for additional perspective.
If you need to modify behavior in a platform-specific way, use the setup
facts to determine which tasks to run. You can get a list of setup facts
by running ansible -m setup
against a target system.
Do not change the default value of the
hash_behaviour
variable - the default replace
setting is more deterministic, easier
to understand, and handles removal of items, all of which can't be
achieved with other values of this setting.
Generally, roles are split into two major groups:
These roles configure or set up basic system functionality or do basic scripting and maintenance of the system.
Examples:
- Configuration of the "base system" (anything that is pre-installed
by the default installation)
- Configuring cron, logging, etc.
- Adding scripts for system tasks like backup
- Creating user accounts (see the
provision-users
role) - Changing network settings (Firewall, VPN, etc.)
These are roles that add software to the base system, in various ways, and should install and configure software that is not automatically installed in the base installation.
Examples:
- Installing software like
nginx
,acme.sh
, orpostgres
- Creating limited privilege role accounts for running the software
- Configuring the software installed
The group_vars
directory should contain variables specific to a
sub-classification of hosts - this is usually done on a per-site, or
per-function basis. These are named <groupname>.yml
.
The host_vars
contains variables specific to a host, and should have
files named <hostname>.yml
.
Inventory is the list of hosts and what group they should be assigned into.
Currently these lists are being kept in flat files in the inventory
directory, but in the future they'll be dynamically built from NetBox
or a similar IPAM system.
All YAML files (including Ansible playbooks, roles, etc. ) are scanned
with yamllint
.
All Ansible playbooks and roles are scanned with ansible-lint
. Occasionally,
you may run into issues that look like this:
CRITICAL Couldn't parse task at molecule/default/verify.yml:27 (couldn't
resolve module/action 'community.mysql.mysql_query'. This often indicates a
misspelling, missing collection, or incorrect module path.)
This happens when ansible-lint
can't find the correct collection. To resolev,
set the variable ANSIBLE_COLLECTIONS_PATHS to the ansible directory - example:
export ANSIBLE_COLLECTIONS_PATHS=~/Documents/onf/infra/ansible
Python code is formatted with black, and must pass flake8 and pylint (py3k compat check only) .
Tests are done on a per-role basis using Molecule, which can test the role against Docker containers (the default) or Vagrant VMs (more complicated to set up).
If the role will run a daemon you should request that the container is
run in privileged mode, which will run an init daemon to start the
services (in most cases, systemd
). The -priv
Docker images that are
used includes a working copy of systemd
, like a physical system would
have, and are created from the
paulfantom/dockerfiles
repo.
If the role depends on other roles to function (needs a database or JRE)
you can install those other roles in the prepare.yml
playbook. See the
netbox
role for an example. This prepare playbook will only be run
once during the initial setup of the container/VM, not every time the
converge
is run.
Individual steps of the test process can be run from Molecule - see the test sequence commands.
The most frequently used commands during role development are:
molecule converge
: Bring up the container and run the playbook against itmolecule verify
: Run theverify.yaml
playbook to testmolecule login
: Create an interactive shell session inside the container/VM to manually debug problemsmolecule destroy
: Stop/destroy all the containersmolecule test
: Run all the steps automatically
A common devel loop when editing the role is to run:
molecule converge; molecule verify
If you need more verbose output from the underlying ansible tools add the
--debug
flag to the molecule
command, which will pass the -vvv
verbose
parameter to ansible-playbook
.
The setup module isn't regular between OS's with the ansible_processor_*
options. OpenBSD has quoted numbers for quantities, Linux does not.
ansible_processor_count
is sockets on Linux, but the same as number of cores
on OpenBSD. There are also sometimes differences between Linux distros - YMMV.
Similar issues with network interface configuration - on Linux the
ansible_eth0['ipv4']
is a dict, but it's a list in OpenBSD.
Currently ansible-lint throws exceptions when using modules from collections, which makes checking some playbooks difficult with that tool. This primarily affects the NetBox related tasks.
This repo does not pass the REUSE check because of REUSE issue 246.