Migrate SecureDrop servers to Ubuntu 16.04 (Xenial) #3204

eloquence · 2018-03-29T23:17:12Z

Ubuntu Trusty is reaching EOL in April 2019. We should upgrade SecureDrop servers to Ubuntu Xenial (16.04) before then. This will also unblock some blocked issues.

(Please keep discussion about moving to 18.04 out of scope of this issue. We will consider the best path to 18.04, but will not immediately go from 14.04 to 18.04.)

Initially this epic captures preliminary only work; we will update it as we discover more work. The preliminary work must only impact the development environment and must not have production consequences.

Tasks:

In current sprint

Verify Trusty backup -> Xenial recovery story ([xenial] Verify Trusty backup -> Xenial recovery story #3960)
Test upgrade path: Ubuntu 14.04.5 → 14.04.5 with all updates → 16.04 ([xenial] Test upgrade path: Ubuntu 14.04.5 → 14.04.5 with all updates → 16.04 #3965)
Journalist replies are not displayed in source interface ([xenial] Journalist replies are not displayed in source interface #4013)
Handle import_ok exception during upgrade ([xenial] Handle import_ok exception during upgrade #4108)
Ensure TestInfra test completeness on Trusty, Xenial ([xenial] Ensure TestInfra test completeness on Trusty, Xenial #3964)
Document the upgrade process ([xenial] Document upgrade process #4057)
Reinstate LTS upgrade notification ([xenial] Reinstate LTS upgrade notification #4104)
Update the install guide to account for the use of Xenial ([xenial] Update the install guide to account for the use of Xenial instead of Trusty as the server base OS #4103)

Stretch goals

Implement nightly CI upgrade test from 14.04 to 16.04 ([xenial] Implement nightly CI upgrade test from 14.04 to 16.04 #3969)

The text was updated successfully, but these errors were encountered:

conorsch · 2018-09-26T23:00:40Z

Config tests are passing against Xenial. Will run through a few repeated runs on both Trusty and Xenial to make sure there aren't any test flakes. On the subject of flakes, over in #3206 (comment) I noted:

OSSEC (the no-messages-containing "ERROR: Incorrectly formated message from" check is failing; that's bad, and could take a while to debug).

Pleased to report that was a false alarm! The corrupted messages were caused by my running a separate app-staging VM concurrently, for use with developing the SecureDrop Workstation. Since the app-staging IPv4 addresses are hard-coded in staging, the machines were fighting, and OSSEC was correctly reporting bad messages due to inconsistent key exchange. Powering off the secondary app-staging machine and rebuilding resolved all OSSEC-related errors i nthe config tests.

heartsucker · 2018-11-29T15:43:34Z

Hey, so this came up when chatting with @redshiftzero recently, and I figure I'll leave my thoughts here.

We had a timeboxed upgrade attempt from Trusty to Xenial, but I think we should not do this on production boxes. I have a few times upgraded laptops from one Debian version to the immediate next using apt dist-upgrade and fiddling with the sources.list{,.d} entries. This has worked in as much as I had a functional operating system and most things "just worked" as they did after the upgrade and next reboot.

Where things didn't work in cases where a piece of software was included as the default in version N-1 but not in version N. I mean this specifically in the case of Jessie to Wheezy when the OS went from init to systemd. This was fine because there was compatibility with all the packages, but it made debugging a lot harder because searches like wheezy upstart tor or whatever just had crap results.

It's also possible to do a bunch of manual fiddling to get the system in to the next state by installing, removing, updating alternatives, and updating the default configs, but this is a lot of work. I hesitate to suggest we do this because one problem we have with SD is that right now it's treated like a package/service when in reality it's more an appliance. Which is to say, we don't have full control or even introspection into the base OS, so if things get a little out of whack, we just don't know.

I say we should tell admins that they need to do a full reinstall and give them one (two?) releases of SD in which to do that in where we support both Trusty and Xenial. We already have the backup / restore script that does the magic of replacing the files, so this not so burdensome.

My biggest concern isn't that we botch the Trusty to Xenial upgrade, and in fact I bet we could totally nail it. The concern is that it's complex and that knowledge will be lost quickly (people forget, and likely only 2-3 people on the team will really fully understand it), and then future change will have to remember that systems could be fresh Xenials or upgrade Xenials when considering all ops / app related problems.

Another advantage of just mandating that we do reinstalls is that it's less engineering effort on our side, and this deadline feels very close. Given this constraint, it seems like a safer choice even if we disregard the long term complexities I mentioned above.

Also, I'm acknowledging that I haven't been paying super close attention to this ticket/epic and we may be too far down the line to change this now. Or we may have addressed these concerns already, and what I'm saying here is out of date.

kushaldas · 2018-11-29T16:29:43Z

I say we should tell admins that they need to do a full reinstall and give them one (two?) releases of SD in which to do that in where we support both Trusty and Xenial. We already have the backup / restore script that does the magic of replacing the files, so this not so burdensome.

If we have to ask for reinstall, this is a good time to think other options in the server side too. For example, the read only file system of Atomic (the PoC I did).

redshiftzero · 2018-11-29T16:59:43Z

Another advantage of just mandating that we do reinstalls is that it's less engineering effort on our side, and this deadline feels very close.

This might take development effort away, but then it puts that effort onto administrators and FPF through support (i.e. some core engineering staff would need to travel to assist with reinstalls). Many administrators already reinstalled fresh late last year, which significantly slowed development for about a month amid significant travel from the core team installing SecureDrops at major news organizations. In my opinion, we should not do reinstalls unless there is an unavoidable technical reason (we haven't come across one yet), as I don't want us to largely halt development in order to travel around and do these reinstalls. Unfortunately, if we don't assist people (ignoring the fact that we have support contracts with a bunch of news organizations) and we don't provide an upgrade path, it means that a significant fraction of instances could be on an EOL OS or stop using SecureDrop altogether, either of which is a bad outcome for sources in my opinion.

If we have to ask for reinstall, this is a good time to think other options in the server side too. For example, the read only file system of Atomic (the PoC I did).

We need to complete the Xenial transition in the next few months. In SecureDrop's current state, I don't see how we could be ready to move to a read only file system before Trusty EOLs. What do you think?

zenmonkeykstop · 2018-11-29T17:00:50Z

A reinstall and restore is definitely the easier target to hit for us, but I'm loathe to recommend it as I feel it puts a lot of responsibility on admins, especially those who didn't install the system in the first place (because they inherited it or because FPF did it for them). There are a lot of steps for things to go wrong, and it would be hard for us to know how they went about doing the upgrade if things did go wrong.

@heartsucker, given that we're talking about systems where we have a pretty good idea of their current state, how much of the upgrade complexity you mention could be identified ahead of time and reproduced in an Ansible playbook or similar?

kushaldas · 2018-11-29T17:02:05Z

We need to complete the Xenial transition in the next few months. In SecureDrop's current state, I don't see how we could be ready to move to a read only file system before Trusty EOLs. What do you think?

I am not saying to do this before moving to Xenial, but, we should keep options open for future. And, reinstall can be a part of that story (in long term).

zenmonkeykstop · 2018-11-29T17:07:14Z

Also, I feel that no matter what, this won't be an unattended update. Workflow could go something like:

we push out an updated securedrop-admin with an os-upgrade task that invokes Ansible to do the upgrade
when they're ready, admins:
1. back up their instance
2. run securedrop-admin os-upgrade

(Actually the os-update task should probably just do the backup automatically, or prompt them.)

(One thing that would be cool to have from the support perspective is a way to verify the OS in use by a given instance, so we could have a Nagios check like that for SD version. Could have security implications however, though it's not like it's not common knowledge what OSes an SD instance is likely to be running.)

zenmonkeykstop · 2018-11-29T20:06:15Z

@conorsch here's the playbook I mentioned earlier: https://www.jeffgeerling.com/blog/2018/ansible-playbook-upgrade-all-ubuntu-1204-lts-hosts-1404-or-1604-1804-etc

ninavizz · 2019-02-06T20:10:07Z

tagging myself to get on my radar

redshiftzero · 2019-04-16T00:39:26Z

I removed #3208 from this epic because it needs further discussion and does not need to be coupled to this issue. Closing this epic as the other work has been completed

eloquence added the epic Meta issue tracking child issues label Mar 29, 2018

This was referenced Mar 29, 2018

Suppress OSSEC alerts asking SecureDrop administrators to upgrade to Xenial #3205

Closed

Support for 16.04 Ubuntu Release #1530

Closed

eloquence added this to the Long Term Product Backlog milestone Mar 29, 2018

This was referenced Mar 29, 2018

[xenial] Add staging VMs for Ubuntu 16.04 #3206

Closed

[xenial] Perform timeboxed install attempt of SecureDrop against Ubuntu 16.04 #3207

Closed

ghost added the ops/deployment label Apr 5, 2018

ageis mentioned this issue May 3, 2018

alternatives to 8.8.8.8 #1833

Open

heartsucker mentioned this issue May 31, 2018

Fedora Atomic as base OS for SD #3492

Closed

conorsch mentioned this issue Jul 13, 2018

Installing gettext from Xenial repos is dangerous and can break the sytem #3636

Closed

eloquence mentioned this issue Sep 24, 2018

[xenial] Enable installation on Xenial staging VMs #3825

Closed

2 tasks

eloquence mentioned this issue Nov 28, 2018

[xenial] Add host firewall configuration change to permit _apt user to fetch packages #3952

Closed

zenmonkeykstop mentioned this issue Nov 28, 2018

Review and update hardware recommendations #3826

Closed

7 tasks

This was referenced Dec 4, 2018

[xenial] Implement nightly CI upgrade test from 14.04 to 16.04 #3969

Closed

[xenial] Implement securedrop-admin os-release-upgrade feature #3970

Closed

redshiftzero pinned this issue Dec 15, 2018

This was referenced Jan 9, 2019

Add docs for reprovisioning workstations #4026

Closed

[xenial] Display advisory in Journalist Interface for instances running 14.04 (Trusty) #4027

Closed

This was referenced Jan 18, 2019

Document workstation reprovision/update procedure #4044

Closed

[xenial] Document upgrade process #4057

Closed

[xenial] Add base OS info to metadata endpoint #4059

Closed

This was referenced Feb 6, 2019

[xenial] Reinstate LTS upgrade notification #4104

Closed

[0.12.0] [xenial] Create test plans and QA matrix #4105

Closed

redshiftzero mentioned this issue Feb 7, 2019

[xenial] Handle import_ok exception during upgrade #4108

Closed

redshiftzero closed this as completed Apr 16, 2019

redshiftzero unpinned this issue Apr 16, 2019

This was referenced Sep 4, 2019

Enable TLSv1.3 ciphers #4725

Closed

Support for Ubuntu 20.04 (Focal) #4768

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate SecureDrop servers to Ubuntu 16.04 (Xenial) #3204

Migrate SecureDrop servers to Ubuntu 16.04 (Xenial) #3204

eloquence commented Mar 29, 2018 •

edited by redshiftzero

conorsch commented Sep 26, 2018

heartsucker commented Nov 29, 2018

kushaldas commented Nov 29, 2018

redshiftzero commented Nov 29, 2018

zenmonkeykstop commented Nov 29, 2018 •

edited

kushaldas commented Nov 29, 2018

zenmonkeykstop commented Nov 29, 2018 •

edited

zenmonkeykstop commented Nov 29, 2018

ninavizz commented Feb 6, 2019

redshiftzero commented Apr 16, 2019

Migrate SecureDrop servers to Ubuntu 16.04 (Xenial) #3204

Migrate SecureDrop servers to Ubuntu 16.04 (Xenial) #3204

Comments

eloquence commented Mar 29, 2018 • edited by redshiftzero

In current sprint

Stretch goals

conorsch commented Sep 26, 2018

heartsucker commented Nov 29, 2018

kushaldas commented Nov 29, 2018

redshiftzero commented Nov 29, 2018

zenmonkeykstop commented Nov 29, 2018 • edited

kushaldas commented Nov 29, 2018

zenmonkeykstop commented Nov 29, 2018 • edited

zenmonkeykstop commented Nov 29, 2018

ninavizz commented Feb 6, 2019

redshiftzero commented Apr 16, 2019

eloquence commented Mar 29, 2018 •

edited by redshiftzero

zenmonkeykstop commented Nov 29, 2018 •

edited

zenmonkeykstop commented Nov 29, 2018 •

edited