Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated web application configuration migration #1966

Open
garrettr opened this issue Jul 12, 2017 · 17 comments
Open

Automated web application configuration migration #1966

garrettr opened this issue Jul 12, 2017 · 17 comments
Assignees
Labels
epic Meta issue tracking child issues goals: packaging needs/discussion queued up for discussion at future team meeting. Use judiciously. ops/deployment

Comments

@garrettr
Copy link
Contributor

The SecureDrop web application's configuration is read from config.py, which is generated from a template, config.py.example, by the initialize_securedrop_app task during the initial run of the Ansible playbooks.

During the initial development of the SecureDrop auto-upgrade strategy, we faced a dilemma in managing config.py. The dilemma is mostly captured in comments (1, 2) in the relevant Ansible role, but I will summarize it here for clarity.

At the time of development, the most straightforward way to provide an auto-upgrade capability for SecureDrop was to use Debian packages. The SecureDrop application code is provided by the securedrop-app-code package, and production instances check https://apt.freedom.press periodically for updates to this package. Files that contain instance-specific configuration cannot be included in the Debian package because they would be overwritten on update.

Additionally, the instance-specific configuration depends on configuration values that must be provided by the SecureDrop administrator during installation, so the task of generating and maintaining the web application configuration fell to the Ansible playbooks, which are configured with vars by the administrator during installation and run from the Admin Workstation.

Finally, there are some configuration values which automatically generated by the Ansible playbooks. For security, these values must be unique, random, and generated from a secure CSPRNG (e.g. /dev/urandom). These values are:

  • source secret key—the Flask SECRET_KEY for the Source Interface, used as an HMAC key to secure Flask's sessions from tampering.
  • journalist secret key—same, but for the Journalist Interface.
  • scrypt id pepper—random value used as an instance-wide salt (aka "pepper") for deriving Source's filesystem ID's from their codenames.
  • scrypt gpg pepper—random value used as an instance-wide salt (aka "pepper") for deriving Source's reply keypair passphrases from their codenames.

Once set, it is critical that these values not be modified during the lifetime of the instance. To be specific:

  • {source,journalist} secret key could be modified with only minor disruption to the operation of the instance. Changing these values would invalidate all current sessions, which means any currently logged in users would encounter errors and be required to log in again.
  • scrypt {id,gpg} pepper cannot be modified without severe disruption to the operation of the instance. Since these values are required to derive source ID's and reply keypair passphrases from their codenames, modifying them would make it impossible for sources who had created their accounts prior to the modification to access their accounts.

The Ansible playbooks do not copy these values back to the Admin Workstation as part of the installation process, although they are included in backups made by the Ansible backup role. Because it is so important to avoid accidentally overwriting these automatically generated values, the Ansible playbooks currently take a defensive approach and avoid interacting with config.py at all if it is found to already exist on the system.

Automated migration of the SecureDrop web application configuration is prevented by the confluence of all of these factors. In order to make it possible, we need to decide:

  1. How should configuration be canonically represented?
    • Configuration is not just the values of variables in config.py: it is also any files that may be referenced by config.py, e.g. the GPG key whose fingerprint is referenced in JOURNALIST_KEY.
    • We have discussed how using a Python file to store configuration is problematic because it can (and currently does) contain conditional logic, which makes it harder to reason about the configuration and makes it hard to separate concerns.
  2. Where should the canonical representation of the configuration be stored (e.g. the Admin Workstation, the servers, both)? Any component of the system that can auto-migrate the configuration will need a copy of the canonical representation to handle all possible migrations.
  3. What components of the system should be allowed to automatically migrate the configuration?
    • This is tricky. On one hand, it could be useful to allow the securedrop-app-code package to automatically migrate the web application configuration. On the other hand, this creates a potential need for conflict resolution between disparate representations of an instance's configuration.

Clearly, this is a complex issue and further discussion and design effort is warranted. I think it should be possible to break this issue up into discrete sub-issues and resolve it incrementally, but we will need to carefully devise a plan to do so.

@ageis
Copy link
Contributor

ageis commented Jul 13, 2017

My two cents, at least on question ♯2, is that there should be a system-wide configuration file. (/etc/securedrop/config). Obviously, for the purpose of running in development mode, a "current working directory" file will take precedence if the system one is not present.

On ♯1 and ♯3, you might consider putting it under VCS (like a tool such as etckeeper does) on the server and potentially representing your config updates as patches/diffs that are applied and committed by the .deb in the postinst script. Not sure if the variance in config.py is enough that you'd have to generate new patches locally on the fly for each instance, but you should be able to ship patches that apply cleanly for everybody. This of course touches upon static/functional changes only, but a different process leveraging Ansible would be necessary to update the dynamically generated/random/custom values. You could have git bundle copy the config directory repository to the Admin Workstation every time there's an Ansible run as part of the backup role. Maybe even throw it in a vault if any protection is needed.

Admittedly, working with patches would be sticky, especially because of indeterminism throughout the install base and since one cannot expect the lineinfile module to make substitutions and insertions in consistent locations, so this is only a brainstorm. For extra sanity checking, you can use test plus grep to ensure the presence of certain lines (go download and check out the GitLab Debian package for a great example of this). Assuming you stick with Python instead of switching to a simpler ENV var flat file-like format (like the contents of/etc/default—which I believe would be easier to maintain), then also run syntax checks against it every chance you get.

More seriously worth considering is that you can store some variables persistently in debconf or dconf with ease. If it happens to work for your security model to put the keys and peppers in the debconf settings database, then the app code package is able to rewrite the config anew on every update instead of, or perhaps in addition to, backing it up in version control. This way, both Ansible and the app code .deb are able to query debconf for the canonical values. There should be a field type (string, I think) which can handle multi-line GPG keys without issue. I guess the idea is that you can sort've duplicate some logic between Ansible deployments and dpkg updates in a two-pronged approach to ensuring a usable yet fluidly changeable config. That is, you set custom values during the initial setup, subsequently write them out via a template whenever the package is installed or updated, and I guess hope that you don't have to release a version featuring a new custom configuration variable (as opposed to a random secret—there's no problem with using the postinst to generate those if absent), which seems like it'd always require an Ansible run to either include or alter.

It goes without saying that in order to convert existing installs to any of these scenarios you necessarily require a script in the postinst or an automation role or some tags you tell admins to run, whatever is the preferred migration strategy.

To me it makes a lot of sense to tie the configuration strongly to the application code package, for compatibility among other reasons, even though that is not what happens in practice, as it's managed by the app role. And in fact, dpkg has explicit features which make it convenient to ship patches with your program, as well as features for conflict resolution, managing configs and rendering templates. I seem to recall some resistance to relying on the control scripts for stuff, but a lot of software does this quite extensively and so long as you're still with this distribution method it's the straightest way to go that doesn't always require admin interaction.

@kushaldas
Copy link
Contributor

kushaldas commented Mar 16, 2018

I have the following idea, please comment what do you all think.

New configuration file

We will have a static configuration file at /etc/securedrop/sdconfig.cfg. This will use the old school configuration syntax, (INI file), This will ensure that we can use the Python standard library to read this configuration in our codebase (without adding any new dependency).

To create a config file for new installation

This step is straight forward, using the same Ansible style we have, we can create a new configuration file.

How to create the new file on the existing deployments/installs

Have a post install Python script, which will read (by standard python import) the current configuration file (the config.py), and it will create a new configuration file in the /etc/securedrop/ location. Now, we know that there are a few attributes, which may be missing in the configuration. But, we also have some default values for those in our code. This config generation script will use default values if any particular configuration option is missing in the deployment (say, SESSION_EXPIRATION_MINUTES or SUPPORTED_LOCALES) to create the new configuration files.

What are the possible changes in the application code (to handle the new configuration)?

Only the sdconfig.py will require update as it is the file which creates a configuration for the rest of code.

We will also have to update our Ansible logic to push out any changes to that new configuration file.

What are the benefits?

The first two comments already talk about standard benefits. I will talk about a new point of view for the developers of the project.

This change will enable us to remove all dynamic attribute access from the code (remember the most popular #3033). Means we will be able to have a stable definition of the SDConfig class, and no more magic default values in different places of the code.

I will also update this ticket with an example configuration file by Monday.

@msheiny
Copy link
Contributor

msheiny commented Mar 16, 2018

This will use the old school configuration syntax

@kushaldas you mean ini format right? Just talking out loud here for everyone's benefit as it seems thats what you mean. ansible also has good support for that - so thats another benefit 👍

@kushaldas
Copy link
Contributor

@kushaldas you mean ini format right? Just talking out loud here for everyone's benefit as it seems thats what you mean. ansible also has good support for that - so thats another benefit +1

Correct, the standard INI file (not the windows style though) 🚌

@kushaldas
Copy link
Contributor

kushaldas commented Mar 19, 2018

Here is an example configuration file created by code from my config.py in development VM.

[JournalistInterface]
secret_key = xh/nRcAkil76bgfhvjk7cwwAX4fuu2nqOQBhmp1CCNo=
session_cookie_name = js

[SourceInterface]
secret_key = rM4F6Zj0fYaZrBmQjb9hf/n1Te8xC3CTQGGB3FdaACw=
session_cookie_name = ss

[Database]
database_engine = sqlite
database_file = /var/lib/securedrop/db.sqlite

[Scrypt]
scrypt_gpg_pepper = L/PsZFeq0r70G8jE32dJ5WO864aqTpXDyODo2mHdpy8=
scrypt_id_pepper = 2gOjOifkQkhb61YGd8q0reo3daHSCUyiOkGkGA8Sifk=
scrypt_param_n = 16384
scrypt_param_r = 8
scrypt_param_p = 1

[SecureDrop]
securedrop_data_root = /var/lib/securedrop
securedrop_root = /home/kdas/code/securedrop/securedrop
gpg_key_dir = /var/lib/securedrop/keys
journalist_templates_dir = /home/kdas/code/securedrop/securedrop/journalist_templates
source_templates_dir = /home/kdas/code/securedrop/securedrop/source_templates
store_dir = /var/lib/securedrop/store
temp_dir = /var/lib/securedrop/tmp
adjectives = /home/kdas/code/securedrop/securedrop/./dictionaries/adjectives.txt
nouns = /home/kdas/code/securedrop/securedrop/dictionaries/nouns.txt
word_list = /home/kdas/code/securedrop/securedrop/wordlist
worker_pidfile = /tmp/securedrop_worker.pid
session_expiration_minutes = 120
journalist_key = 65A1B5FF195B56353CC63DFFCC40EF1228271441
supported_locales = en_US,fr_FR
default_locale = en_US

The env attribute of the config.py will be directly created inside of the SDConfig.py (using the same code),

# Modify configuration for alternative environments
env = os.environ.get('SECUREDROP_ENV') or 'prod'

If any required configuration value is missing, SDConfig.py will have the details values, that will reduce magic default values in our code, and there will be only one place to find them.

@heartsucker
Copy link
Contributor

Yo tossing this out here. Maybe we should use YAML and not INI. Sometimes it's nice to have multiple layers of nesting and things like lists. I would strongly prefer that approach. This would also integrate nicer with ansible.

@eloquence eloquence removed this from the 0.7 milestone Apr 5, 2018
@eloquence eloquence added this to the Long Term Product Backlog milestone Apr 5, 2018
@eloquence eloquence added the epic Meta issue tracking child issues label Apr 6, 2018
@heartsucker
Copy link
Contributor

A long while ago I talked with @squeed about formats and he said either JSON or INI on the grounds that other things (esp. YAML) have cause problems for him in real world scenarios. I still think we want data structures that are more complex than what INI provides.

Is there a compelling reason to use INI over JSON?

@kushaldas
Copy link
Contributor

@heartsucker this was a POC with only stdlib, the general feedback I received was to use YAML.

@heartsucker
Copy link
Contributor

If the feed back is YAML, I'd propose we use JSON then given his recommendations from other deployments, yeah?

@squeed
Copy link
Contributor

squeed commented Aug 27, 2018

This is a total bikeshed, but, json is a bit nicer than yaml, especially when being written and read programatically. Yaml has some parsing surprises (e.g. number vs. string).

My suggestion for INI-files was that it enables systemd-style dropins, which are amazingly useful. They allow for overriding arbitrary entries without the need for sed.

@heartsucker
Copy link
Contributor

I think we don't need systemd dropins so much. I think the master config should be things that are hardcoded and will never change (secrets), and as many as possible other config options should be done from the web interface and stored in the sql db. We are generally moving toward SD as being more of a plug-n-play appliance than a configurable web app, so tinkering with something like WSGI worker settings or other such things will eventually not be an option.

@heartsucker
Copy link
Contributor

Additional notes:

We have a lot of config values that should never be controlled by the user and are only necessary to have controlled during testing/dev. These should not be read from a config file, but should have default values in the SDConfig object that are read by the app. We can programmatically override them during dev/testing. For example, wordlist file locations, or session expiration times. Altering these has the potential to break the app or remove certain security guarantees.

@heartsucker
Copy link
Contributor

Proposed format:

{
  "source_interface": {
    "secret_key": "j2IcwRt6A2IsUeNEChZqp66cn7cpf/oP6rxO9d8M2E8=",
    "scrypt_id_pepper": "6KTTMPB+0RNV6TREDVD/7r8m/Pm5otK+7eyfaYWTm6s=",
    "scrypt_gpg_pepper": "GlAWMF9rk6eH671IPiu9pBu06PmvKR+LnC+wQI/jqkM=",
    "i18n": {
      "default_locale": "en_US",
      "supported_locales": ["en_US"]
    }
  },
  "journalist_interface": {
    "secret_key": "SGOuAsFRqs145PTueeMyCnMtLotjDldNL08MlzEHWdg=",
    "scrypt_id_pepper": "6KTTMPB+0RNV6TREDVD/7r8m/Pm5otK+7eyfaYWTm6s=",
    "scrypt_gpg_pepper": "GlAWMF9rk6eH671IPiu9pBu06PmvKR+LnC+wQI/jqkM=",
    "i18n": {
      "default_locale": "en_US",
      "supported_locales": ["en_US"]
    }
  },
}

Note that some values are duplicated so that in the future we can split the configs into two files to that the JI won't have access to the SI secrets and vice versa.

These are actually all the values that should be configurable from the admin side. The rest should be hardcoded and unchangeable in prod.

@heartsucker
Copy link
Contributor

While working on this, I have hit annoying errors related to the fact that not all tests use pytest fixtures. I'm going to call #2877 a soft blocker on this because I haven't been able to figure out a work around though it may be possible.

@squeed
Copy link
Contributor

squeed commented Sep 9, 2018

question: are site-specific but theoretically changeable things, like domain, configured in this manner?

@heartsucker
Copy link
Contributor

config.py is only for python/flask specific things. Other config values come from prod-specific.yml which populates thinks like virtual hosts in the apache config and so on.

@eloquence eloquence added the needs/discussion queued up for discussion at future team meeting. Use judiciously. label Jun 23, 2020
@eloquence
Copy link
Member

Per #3850 (comment) our bias is to incrementally move more options into the DB so they can be managed through the web interface. We may still ultimately want to switch to a more maintainable format for other configuration values, so leaving this issue open for discussion (though we may close in favor of more narrowly scoped improvements).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Meta issue tracking child issues goals: packaging needs/discussion queued up for discussion at future team meeting. Use judiciously. ops/deployment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants