Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Qemu emulation on GHA for Vagrant tests #503

Merged
merged 3 commits into from Feb 9, 2021

Conversation

mikedep333
Copy link
Member

@mikedep333 mikedep333 commented Dec 4, 2020

Adapted from:
https://github.com/pulp/pulplift/pull/66
"RFC: Testing nested Virtualization"

Implementation Includes:
1. Upgrade Qemu from 4.4 to 5.2 from our PPA to address a
severe bug affecting CentOS 7 guests, they could not even validate
SSL certs with curl / yum or create the Pulp postgres database.
2. Upgrade the rest of the virtualization stack on Ubuntu
3. Address the EL8 vagrant-sshfs workaround task failing due to
a GPG signature mismatch.
4. Workaround a bug with VM storage on the newer virtualization
stack.
5. Switch the boxes used on CentOS 7 for more recent updates.

workaround #8095: FIPS failure in geerlingguy.postgresql by
using an old version.
https://pulp.plan.io/issues/8095

workaround #7993:
pulp_installer fails to create the database on EL7 when LANG=C.UTF-8
https://pulp.plan.io/issues/7993

fixes: #7884
Move the pulp_installer Vagrant tests off Travis
https://pulp.plan.io/issues/7884

@mikedep333 mikedep333 force-pushed the qemu-emulation branch 11 times, most recently from 4a7e9a8 to c86a2f6 Compare December 4, 2020 20:00
@mikedep333 mikedep333 force-pushed the qemu-emulation branch 18 times, most recently from f5f4efe to 95794c4 Compare December 9, 2020 01:07
@mikedep333 mikedep333 force-pushed the qemu-emulation branch 7 times, most recently from ae082c4 to 64c8add Compare January 25, 2021 18:01
@mikedep333 mikedep333 changed the title Test Qemu emulation Use Qemu emulation for Vagrant Tests Jan 25, 2021
@mikedep333 mikedep333 changed the title Use Qemu emulation for Vagrant Tests Use Qemu emulation on GHA for Vagrant tests Jan 25, 2021
@mikedep333 mikedep333 force-pushed the qemu-emulation branch 3 times, most recently from 3742f78 to 3c0df58 Compare January 26, 2021 15:29
@mikedep333 mikedep333 force-pushed the qemu-emulation branch 2 times, most recently from 55ef148 to c7fa77b Compare January 27, 2021 20:46
To ensure that git recognizes that they are existing ones rather than
new ones. This would likely be a problem as we are changing them
heavily in the next commit.

[noissue]
@mikedep333 mikedep333 force-pushed the qemu-emulation branch 2 times, most recently from 9312900 to c47ac0b Compare February 8, 2021 20:13
fi

sed -i -e 's/memory: 10500/memory: 5500/g' vagrant/boxes.d/*
sed -i -e 's/cpus: 4/cpus: 2/g' vagrant/boxes.d/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does GHA not handle 4?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GHA only has 2 virtual CPUs, just like Travis.

These 2 lines only have any effect if when we run pulp2-nightly-pulp3-source-centos7 on CI, which we currently do not do.

@@ -99,3 +99,31 @@ jobs:
PY_COLORS: '1'
ANSIBLE_FORCE_COLOR: '1'
shell: bash
vagrant:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should run for branches instead of PRs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we'd test immediately after merging?

I'm open to that. I figured we'd adjust the schedule according to dev feedback. Or look into enabling merging while these tests still have not completed yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me

Comment on lines +18 to +19
# timeout 90 is needed Pulp to service its 1st request on extremely slow
# machines, such as qemu-emulated 2-core machines
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs a new changelog entry

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. I'll make it a separate commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerrod3 this is the change I mentioned

Comment on lines +18 to +19
# timeout 90 is needed Pulp to service its 1st request on extremely slow
# machines, such as qemu-emulated 2-core machines
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs a new changelog entry

Copy link
Member Author

@mikedep333 mikedep333 Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll combine it with pulpcore-api's commit/changelog entry, yeah. It may not even be needed for pulpcore-content, but I did it at the same time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerrod3 same here

roles/pulp_health_check/tasks/main.yml Outdated Show resolved Hide resolved
- name: Ensure Pulp is up and healthy
uri:
url: "http://{{ __pulp_default_host if pulp_api_bind.startswith('unix:') else pulp_api_bind }}/pulp/api/v3/status/"
status_code: "200"
unix_socket: "{{ pulp_api_bind | regex_replace('^unix:', '') if pulp_api_bind.startswith('unix:') else omit }}"
# Alternate between 30 second timeouts and 5 second timeouts when handling the situation of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these magic numbers?

Copy link
Member Author

@mikedep333 mikedep333 Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30 is the default. It handles hypothetical extremely long times for the webserver to produce the page.

We intermix it with 5 second waits, because if the server isn't up yet and suddenly is, a 5 second wait usually is sufficient I would think. Just my guestimate.

If your question is about why I combine a short and a long time out together, the purpose is 2:

  1. Is to lower the total amount of time that the CI or the user may wait on the health check when we have a "connection timed out". 38 tries * (30 seconds + 6 second wait) + 37 tries * (5 seconds + 6 second wait) = 29 minutes, rather than 45 minutes.
  2. Still passing a theoretical 29.9 second wait.
  3. Ensuring a large number of checks over a medium period of time (75 tries * 6 second wait = 7.5 min) when the server does "connection refused," which takes under a second.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was more from where they came from, thank you for all the explanation

Solution: Raise the the Pulp server's gunicorn worker timeout to 90
seconds.

fixes: #8228
Pulp Connection Timed Out on slow emulated machines
https://pulp.plan.io/issues/8228
Adapted from:
pulp/pulplift#66
"RFC: Testing nested Virtualization"

Implementation Includes:
1. Upgrade Qemu from 4.4 to 5.2 from our PPA to address a
severe bug affecting CentOS 7 guests, they could not even validate
SSL certs with curl / yum or create the Pulp postgres database.
2. Upgrade the rest of the virtualization stack on Ubuntu
3. Address the EL8 vagrant-sshfs workaround task failing due to
a GPG signature mismatch.
4. Workaround a bug with VM storage on the newer virtualization
stack.
5. Switch the boxes used on CentOS 7 for more recent updates.
6. Reducing how long the pulp health check may take, particularly
when there is a connection timed out.

workaround #8095: FIPS failure in geerlingguy.postgresql by
using an old version.
https://pulp.plan.io/issues/8095

workaround #7993:
pulp_installer fails to create the database on EL7 when LANG=C.UTF-8
https://pulp.plan.io/issues/7993

fixes: #7884
Move the pulp_installer Vagrant tests off Travis
https://pulp.plan.io/issues/7884
Copy link
Member

@fao89 fao89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Mike!

@pulpbot
Copy link
Member

pulpbot commented Feb 9, 2021

Attached issue: https://pulp.plan.io/issues/8228

Attached issue: https://pulp.plan.io/issues/7884

@mikedep333 mikedep333 merged commit 011c2cb into pulp:master Feb 9, 2021
@mikedep333 mikedep333 deleted the qemu-emulation branch February 9, 2021 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants