Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document how to clean up from a previous libvirt make staging run #4993

Closed
redshiftzero opened this issue Nov 14, 2019 · 17 comments
Closed

document how to clean up from a previous libvirt make staging run #4993

redshiftzero opened this issue Nov 14, 2019 · 17 comments

Comments

@redshiftzero
Copy link
Contributor

Description

this is currently not documented and a few developers are having trouble with this

@sssoleileraaa
Copy link
Contributor

sssoleileraaa commented Nov 14, 2019

Following up after troubleshooting errors during make build-debs and make staging...

If you see this error, start over, all the way over. Keep reading...

Screenshot from 2019-11-14 12-00-06

Also, be prepared... make staging takes around 17 minutes after a make clean

This is what I tried when attempting to reinstall staging:

  1. make clean
  2. virtualenv .venv --python=python3.7 and source .venv/bin/activate
  3. pip install --require-hashes -r securedrop/requirements/python3/develop-requirements.txt
  4. make build-debs
    Saw a quay.io server error indicating that it was down (only for ~5-10 minutes it seems):
    TASK [Create builders] *********************************************************
...
    failed: [localhost] (item={'name': 'xenial-sd-generic-ossec-server2', 'groups': ['builders']}) => {"changed": false, "item": {"groups": ["builders"], "name": "xenial-sd-generic-ossec-server2"}, "msg": "Error pulling image quay.io/freedomofpress/sd-docker-builder-xenial:sha256:0288d35d316047302e6e15887eb34fb5440415054835cf0c0f25f5cc8ab80279 - 500 Server Error: Internal Server Error (\"b'{\"message\":\"Get https://quay.io/v2/: dial tcp: lookup quay.io on 75.75.76.76:53: server misbehaving\"}'\")"}
  1. manually delete app and mon VMs in libvirt's Virtual Machine Manager application
  2. molecule destroy -s libvirt-staging-xenial and also checked to make sure /tmp/molecule/ did not exist in case i needed to delete /tmp/molecule/securedrop -- it did not
  3. make staging

I saw an error during make staging that said to read the log file, which said:

An action 'up' was attempted on the machine 'mon-staging',
but another process is already executing an action on the machine.
Vagrant locks each machine for access by only one process at a time.
Please wait until the other Vagrant process finishes modifying this
machine, then try again.
If you believe this message is in error, please check the process
listing for any "ruby" or "vagrant" processes and kill them. Then
  1. ps -ef|grep ruby and kill zombie ruby procs (I also checked to make sure /tmp/molecule/ did not exist in case i needed to delete /tmp/molecule/securedrop)
  2. make staging

I saw another error during make staging that said to look at the logs, which said:

Volume for domain is already created. Please run 'vagrant destroy' first.
  1. vagrant destroy and rerun make staging same error
  2. sudo virsh
  3. In the virsh shell, delete any you see with vol-delete --pool default libvirt-staging-xenial_mon-staging.img
  4. make staging passes without error

Once again after opening the app vm from libvirt's Virtual Machine app, see this error once again:

Screenshot from 2019-11-14 12-00-06

@conorsch
Copy link
Contributor

Thanks for the detailed feedback, @creviera. I'll cobble together some docs clarifications, and tag you for review.

One major oversight is that make clean does not destroy the Molecule-based scenario at all! SO that'll be a fix I include. Right now, I'm reticent to add sudo commands to the clean action, which would be required to warn about "VMs have been detected in libvirt, but Molecule knows nothing about them." We might have to bounce some ideas back and forth to cover that edge case clearly.

@sssoleileraaa
Copy link
Contributor

yeah, i just want to highlight the part of my long comment where i still see the ima: No TPM chip found, activating TPM-bypass! (rc=-19) message here which indicates to me that there is something wrong with the lastest changes on develop (since I last updated my instance a long time ago) cause it seems like the image isn't working on my hardware anymore.
Screenshot from 2019-11-14 12-00-06

@redshiftzero redshiftzero added this to Current Sprint - 11/6-11/20 in SecureDrop Team Board Nov 15, 2019
@kushaldas
Copy link
Contributor

@creviera Which hardware is this? I am guessing the standard T470?

@kushaldas
Copy link
Contributor

$ vagrant --version 
Vagrant 2.2.6

On Buster, I can see the error, but, everything works as usual.

notpm

@sssoleileraaa
Copy link
Contributor

@creviera Which hardware is this? I am guessing the standard T470?

X1 Carbon (6th Gen)

@sssoleileraaa
Copy link
Contributor

Today I downgraded the kernel via: sudo apt remove linux-image-4.14.154-grsec-securedrop so that I would be using linux-image-4.4.182-grsec instead.

The result:

I get passed the error that both @kushaldas and I shared above and am able to see and use the login screen using the libvirt Virtual Machine Manager application.

Interestingly, it's only the console login that is broken -- I'm able to log in via ssh just fine using linux-image-4.14.154-grsec-securedrop.

@emkll
Copy link
Contributor

emkll commented Nov 19, 2019

thanks for the report @creviera . The console/TTY kernel settings are identical for our 4.4 and 4.14 configurations. I will test on hardware to see if I can reproduce, and try to build on the latest grsecurity patch to see if it resolves the issue (the kernel version is still 4.14.154, but there have been a couple of grsecurity patches released since). Since it doesn't immediately block development (the VMs work, ssh works, but it only limited to console access) we can track this issue as part of general (kernel) QA for 1.2.0.

@rmol
Copy link
Contributor

rmol commented Nov 19, 2019

I was able to reproduce the console hang in staging, and can fix it by switching the VM video driver to Virtio from Cirrus. I've found other reports of Cirrus misbehaving in VMs; we could specify a different video driver in the Vagrantfile.

@emkll
Copy link
Contributor

emkll commented Nov 19, 2019

Good catch @rmol , it may have to do with the initial kernel configuration, I did not explicitly enable any virtualization settings (either host nor guest). Perhaps the virtio video support is not included due to guest virtualization not enabled. Given that the VMs work in libvirt otherwise, the issue may be easier to fix in the Vagrantfile, as you describe. If these (and the console) work in virtualbox, i think your suggestion sounds like the quickest path to resolution

@redshiftzero
Copy link
Contributor Author

redshiftzero commented Nov 19, 2019

latest kernel (4.14.154-grsec-securedrop) does work in virtualbox with console

@kushaldas
Copy link
Contributor

I got stuck on this today :(

@redshiftzero
Copy link
Contributor Author

stuck on logging in to the console to staging machines running on libvirt (asking because if so #5005 contains a fix) or fully destroying previously provisioned staging VMs on libvirt?

@emkll
Copy link
Contributor

emkll commented Nov 20, 2019

@conorsch and @rmol have some work in progress to close this issue in https://github.com/freedomofpress/securedrop/compare/4993-explain-libvirt-cleanup-procedures

@eloquence
Copy link
Member

During the 12/5-12/18 sprint, we'll aim to get a PR in for the changes above, and @zenmonkeykstop will help investigate whether this is sufficient (together with the #5005 fix which has landed) to resolve the types of hangs experienced by Kushal and Allie above.

SecureDrop Team Board automation moved this from Current Sprint - 12/5-12/18 to Done Dec 9, 2019
@conorsch
Copy link
Contributor

conorsch commented Dec 9, 2019

Reopening until we sort out #5066.

@conorsch conorsch reopened this Dec 9, 2019
@eloquence
Copy link
Member

#5066 was merged, so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

7 participants