New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize Ansible playbooks #1196

Merged
merged 74 commits into from Nov 17, 2015

Conversation

Projects
None yet
3 participants
@conorsch
Copy link
Contributor

conorsch commented Nov 12, 2015

Substantial changes to the flow of the Ansible playbooks and structure of the roles. Most of the diff is due to files that have been moved or renamed—functional changes were kept to a minimum in order to facilitate review. Once merged, future changes to the Ansible roles will be easier to make and to review.

Major changes

  • Consolidates host-specific shared roles (e.g. common-app, common-mon -> common)
  • Use group_vars for staging and development hosts
  • Manage group membership in manage-groups role for production instances
  • Manage group membership in Vagrantfile for testing VMs
  • Support ANSIBLE_ARGS environment variable for managing tags and skipped tags on testing machines

Minor changes

  • More readable task names (also added task names for plays)
  • Several improvements to idempotence (e.g. command calls only report changed when appropriate)
  • Typo fixes (e.g. iptables comments)

Happy to review in person since the diff is large. Once this PR is merged, there's a list of ops-tagged issues outstanding that I plan to knock out posthaste.

conorsch added some commits Aug 11, 2015

Makes FPF apt repository installation depend on common role
Uses roles/common/meta/main.yml to set the dependency.
Establishes group_vars dir for staging hosts
Provides a more idiomatic method for managing vars
specifically for the -staging hosts.
Updates tests for cron-apt reboot timing.
Matches changes to the Ansible config described
in #1071 and resolved by #1088.
Updates tests for Postfix security settings.
As of #1100, the Ansible config defaults to
trusting system CAs for verifying the SMTP relay,
rather than requiring a fingerprint for the TLS cert.
The spectests now check for the new settings.

The Ansible config still supports the fingerprint
verification setup, though, so the old checks are
left in the tests file commented out, to make
conditional tests easier down the road. I'd rather
have the tests fail once on a nonstandard config,
then be easily tweakable, then add lots of conditional
logic to the test suite that would itself require testing.
Stages changes for Postfix mail queue lifetime.
The changes described #1102 haven't made it into
a release yet, so don't test for the relevant line.
I was already editing these spectests, though, so
it was convenient to add the line and leave it
commented out until we actually want to test for it.
Silences vagrant deb package download in test script.
Downloading and installing vagrant has been reliable,
so we don't need to the verbosity at this point. The
snap-ci web interface prints a new line for every
refresh on the download rate (likely because it doesn't
understand carriage returns), which means the first
several pages of output for the development host
are simply downloading vagrant. In the event of a failed
build, that's a lot of scrolling to get to the relevant
information.

Since we're using the `-q` flag on wget , errors with the
vagrant deb package download will still be visible.
Bumps vagrant version version used in snapci pipeline.
Iterating 1.7.2 -> 1.7.3, conservatively. If no problems,
can try again on 1.7.4.
Falls back to Vagrant 1.7.2.
The snapci instances ship with bundler 1.10.6,
which is too new for vagrant:

    Bundler could not find compatible versions for gem "bundler":
      In Gemfile:
        vagrant (= 1.7.3) ruby depends on
          bundler (<= 1.10.5, >= 1.5.2) ruby

      Current Bundler version:
        bundler (1.10.6)
    This Gemfile requires a different version of Bundler.
    Perhaps you need to update Bundler by running `gem install bundler`?
    Could not find gem 'bundler (<= 1.10.5, >= 1.5.2) ruby',
    which is required by gem 'vagrant (= 1.7.3) ruby', in any of the sources.

A longterm fix will control the gem version available
within snapci, but for now let's hold on to 1.7.2.
Adjusts directory permissions check for development host.
Includes verbose comment with TODO. The problem is
the Ansible config not specifying permissions correctly,
and further complicated by the /vagrant directory
being mounted with different permissions. Likely fixing
the Ansible task that sets permissions on the logo will
resolve the permissions disparity, but will need to test
to make sure. For now, the test is passing and will fail
when the Ansible task changes, which is intended.
Breaks out grsecurity config into separate role
Moves grsec-specific vars into role defaults,
breaks apt package cleanup into separate tasklist.
Moves Tor configuration to a generalized role
Removed some old vars. The "securedrop_tor_user" var
was redundant, just reiterated "tor_user" value.
The `tor_DataDirectory` var wasn't used at all, so replaced it with
`tor_hidden_services_parent_dir` throughout.

Generalizes Tor config values for App and Mon host_vars
by using a dict, which can be iterated over within the role,
ensuring THS/ATHS settings are unique per host.
Generalizes "common" role for App and Mon servers
Moves `apt_repo_url` var to FPF repo role.
Removes unused `common-app` role from staging playbook.
Makes new tor config role a dependency of `common`.
Folds /etc/hosts setup into common role.
Generalizes restrict-direct-access role
Rather than have slightly different roles for App and Mon
servers, let's leverage host_ and group_vars so we can
consolidate the roles for reuse.

Removes superfluous 2FA variable, which was only used
in one place, and therefore doesn't need to be an
external var.
Fixes formatting of ATHS files.
The `restrict-direct-access` role has been consolidated,
so it's no longer appropriate to hardcode hostnames
and service names. Instead, use the `tor_instances`
variable, which varies by host.
Generalizes iptables configuration
Most of the iptables rules are shared among hosts, so it
makes sense to use a single template, so rules aren't copy/pasted.
In addition to the shared rules, host-specific rules are added
by checking group membership for each host.
Changes host_vars -> group_vars for servers
Using group_vars vastly simplifies the environment tiers
(i.e. development, staging, and prod). It will, however,
require changes to the prod-only "inventory" file, which
should be handled with care. It's also possible to handle
the group membership for the prod hosts in a task as part
of the prod-specific playbook, which may be useful as a failsafe.
Moves staging-specific vars file into group_vars
Reduces a lot of repetition in the staging playbook.
The app-staging and mon-staging hosts are now properly members
of a "staging" group, which enables more dynamic handling of vars
throughout the Ansible config.
Consolidates plays for staging hosts
Makes host limits more concise, and removes redundant
roles targeting the App and Mon servers separately, since
we have more consolidated roles now.

Uses the new "staging" hostgroup to target both
app-staging and mon-staging with the restrict/allow
direct access roles.
Uses group_vars for development and build hosts
Adds 'development' group to group_vars and Vagrantfile.
The 'development' and 'build' VMs share vars anyway,
so it makes sense to collapse them for easier reuse.

In the app-test role, rather than skipping the "non-development"
tag by default on the development host, only include the skipped
tasks if the current host is in the "staging" group.
Makes xvfb service setup idempotent in app-test role
Previously the update-rc.d command was reporting "changed"
always. Now the task registers the output of the command
and checks stdout for text indicating that the symlink
was already present.
Makes haveged config changes idempotent in app-test role
Previously the `lineinfile` task adjusting the DAEMON_ARGS
var in /etc/defaults/haveged appended the line every time,
causing the playbooks to always report "changed" status for
this task, and therefore always fire the handler for restarting
the haveged service. The duplicate lines don't seem to impair
the service, but it's poor practice.
Fixes "changed" status when importing GPG keys
Previously the task was always reporting changed, but now
it registers the output and checks for `gpg` reporting
that the key was already present. If the key was already present,
then changed is false.
Uses group membership for skipping dev tasks in app role
A note on the formatting of the `when` conditional here.
There's a substantial difference between these two conditionals:

   when: inventory_hostname in groups.foo
   when: "'foo' in group_names"

The former is more readable, but requires the referenced group
to be declared in the inventory file (or elsewhere), whereas the
the latter works even if the group doesn't exist.
Adds role dependencies to common role
The following roles are now conditionally enabled
via common/meta/main.yml:

  - grsecurity
  - install_fpf_repo
  - install_local_pkgs

Previously you needed to skip the "grsec" tag in order
to prevent the grsecurity tasks from being run. Now that the grsecurity
tasks are encapsulated in their own role, toggling the role
is controllable via `--extra-args grsecurity=false`.

Including the install_local_pkgs role can be toggled via the
`install_local_packages` var, which defaults to true in the
`securedrop` group_vars. The role is still included in the staging
playbook, but disabled by default.
Makes /etc/hosts config idempotent
Previously a malformed lineinfile declaration caused
the corresponding host entry to be appended on every
provisioning run, which meant the task was always marked as changed,
and the hosts file became quite cluttered after a long time.
Consolidates checking for grsecurity status
The grsecurity role has several conditional tasks
that check whether the currently running kernel has had
grsecurity patches applied. Therefore grsec_lock file
should be checked once, and its registered variable
used by subsequent tasks within the role, rather than
shelling out and registering multiple vars.
Fixes "changed" status for disabling swap space
Simply put, register the current state via a read-only
command, then compare that to the results of the swapoff
command. A minor improvement in idempotence, but does
improve the honesty of the "changed" output when running
the playbooks.
Adds dynamic group support for prod hosts
Adding group membership information to the production
inventory file would be tricky, causing the Admin to have
stash/pop, possibly more than once, which is terrible UX.
For now, we'll simply check for "app" and "mon" hosts,
as those names are strongly recommended in the docs,
and add those hosts to the appropriate hostgroup.
Adds tags for grsecurity dependency in common role
In certain testing environments such as DigitalOcean, we skip the
grsecurity role via --skip-tags. Rather than sprinkling the "grsec" tag
throughout the grsecurity role, we can tag the include itself and skip
based on that.

@conorsch conorsch force-pushed the conorsch:ansible-reorg branch from 0613b3d to 577bd93 Nov 13, 2015

conorsch added some commits Nov 13, 2015

Moves kernel module blacklisting into common role
Conceptually it makes sense to lump the blacklisting of kernel modules
with the other kernel-related tasks, specifically grsecurity, but in
testing environments we sometimes skip the grsecurity role, and we don't
want to skip the blacklisting in those cases. Therefore we'll include
the blacklisting as part of "common", where it will run all the time
every time.
Explicit includes for grsecurity and tor config roles
Setting them as dependencies of common causes them to be run first,
which isn't ideal. Instead, we'll set the install-fpf-repo as a
dependency of common and grsecurity (since the securedrop-grsec package
is installed from the FPF apt repo). Ansible role only run the common
role once by default, since we're not setting `allow_duplicates: yes` in
the meta/main.yml file for common.
Updates spectest logic for finding droplet IP addresses
The VirtualBox networking setup defaults to NAT on eth0, so the IPv4
address on eth0 isn't what's used for the the OSSEC registration. For
VirtualBox hosts, use eth1, otherwise eth0 by default.
Makes GPG key import tasks idempotent
GPG reports "imported: 1" to STDERR, otherwise "unchanged: 1". Check for
the "imported" string and only mark the task as changed if found.

@conorsch conorsch referenced this pull request Nov 14, 2015

Closed

Fix install_local_pkgs role #1040

5 of 5 tasks complete
Idempotent fetching of hidden service files
Rather than fetch-then-format, which results in two changed tasks every
time the playbook runs, we can read the file contents, then write the
local file in precisely the format we want. The cat-then-write approach
ensures that the tasks are marked "changed" only if the file contents
have indeed been changed, which is what we want.
@@ -1,9 +1,8 @@
---
- name: add cron job to cleanup SecureDrop temp dir daily
- name: Add cronjob to clean SecureDrop tmp dir daily.

This comment has been minimized.

@ageis

ageis Nov 14, 2015

Contributor

I still call it a cron job, not a cronjob ;)

This comment has been minimized.

@conorsch

conorsch Nov 16, 2015

Contributor

Ha! No contest, I'll change it.

conorsch added some commits Nov 16, 2015

Allow full test suite to be run locally
Previously the snap_ci scripts required several manual tweaks for
running offline. It's cleaner to detect the Snap-CI environment via the
"SNAP_CI" env var and automatically clean up droplets if detected. If
not detected, the droplets are retained, which makes repeated use of the
script for running all spectests much more convenient during
development, particularly of the Ansible playbooks.
Cleans up duplicate /etc/hosts entries
A previous version of the lineinfile task managing /etc/hosts was
needlessly appending duplicate lines. The task has been adjusted to use
a smarter regex, but that's not enough, since the new regex will update
the last match in the file, and /etc/hosts parsing returns the first
match—will could be outdated.

Therefore we'll need to duplicate the lines in the file, rewrite the
file, then perform the substitution. This more careful approach requires
three tasks rather than one, but ensures idempotence.
Cleans up duplicate entries for haveged defaults
A previous task was appending redundant config lines, due to a poorly
structured regex. The regex has been fixed, but we still need to clean
up any previously appended lines, to ensure that `lineinfile` continues
to work as expected (it only performs one substitution). So, we'll first
remove any duplicate lines from /etc/default/haveged, then run the
lineinfile replacement to make sure the args are set as intended.
@conorsch

This comment has been minimized.

Copy link
Contributor

conorsch commented Nov 17, 2015

@garrettr @ageis Addressed the issues discussed during in-person review. Regarding the idempotence changes for /etc/hosts, the config now removes duplicate lines from the file, then performs the lineinfile substitution. Requires three tasks rather than one, but ensures idempotence and also cleans up after old, deprecated tasks.

Implemented the same three-task approach for cleaning up /etc/default/haveged, as well.

@garrettr

This comment has been minimized.

Copy link
Contributor

garrettr commented Nov 17, 2015

@conorsch lgtm!

conorsch added a commit that referenced this pull request Nov 17, 2015

Merge pull request #1196 from conorsch/ansible-reorg
Reorganizes Ansible playbooks

@conorsch conorsch merged commit 3600315 into freedomofpress:develop Nov 17, 2015

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.2%) to 63.835%
Details

This was referenced Nov 30, 2015

This was referenced Dec 8, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment