Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't stop certmonger during IPA uninstallation #5505

Closed
wants to merge 1 commit into from

Conversation

rcritten
Copy link
Contributor

@rcritten rcritten commented Feb 1, 2021

An effort prevent CI failures.

https://pagure.io/freeipa/issue/8533

Signed-off-by: Rob Crittenden rcritten@redhat.com

@rcritten rcritten added the WIP Work in progress - not ready yet for review label Feb 1, 2021
@rcritten
Copy link
Contributor Author

rcritten commented Feb 1, 2021

It passed, re-running.

@rcritten
Copy link
Contributor Author

rcritten commented Feb 1, 2021

Passed five times, re-running.

@rcritten
Copy link
Contributor Author

rcritten commented Feb 1, 2021

@flo-renaud do you think there is a better test to exercise this re-ordering of the uninstall?

@flo-renaud
Copy link
Contributor

@rcritten test_cert_fix and test_replica_promotion_TestHiddenReplicaPromotion are probably the jobs that fail the most frequently. But I think the right ticket is https://pagure.io/freeipa/issue/8506 for this issue (8533 happens during replica installation while configuring the cert tracking).

@rcritten
Copy link
Contributor Author

rcritten commented Feb 2, 2021

Thanks for the suggestion, updating the test. I'll have to remember to go back to fix the ticket number.

@rcritten
Copy link
Contributor Author

rcritten commented Feb 2, 2021

Both invocations of test_integration/test_replica_promotion.py::TestHiddenReplicaPromotion failed due to DNS issues. Re-running.

@rcritten rcritten added the re-run Trigger a new run of PR-CI label Feb 2, 2021
@freeipa-pr-ci freeipa-pr-ci removed the re-run Trigger a new run of PR-CI label Feb 2, 2021
@rcritten
Copy link
Contributor Author

rcritten commented Feb 2, 2021

Two more failures, unrelated to dbus.

@rcritten
Copy link
Contributor Author

rcritten commented Feb 3, 2021

Try test_integration/test_caless.py::TestReplicaCALessToCAFull to try to provoke failure. If I can't maybe I'll drop the change to see if I can more easily reproduce this and then add the change back in to see if it actually did anything.

@freeipa-pr-ci freeipa-pr-ci added the needs rebase Pull Request cannot be automatically merged - needs to be rebased label Feb 4, 2021
@rcritten
Copy link
Contributor Author

rcritten commented Feb 4, 2021

Re-based and dropped change to see if one or more of the tests fail.

@rcritten rcritten removed the needs rebase Pull Request cannot be automatically merged - needs to be rebased label Feb 5, 2021
@rcritten
Copy link
Contributor Author

rcritten commented Feb 5, 2021

temp_commit_2 reproduced the uninstall dbus error:

[ipatests.pytest_ipa.integration.host.Host.master.cmd54] org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

@rcritten
Copy link
Contributor Author

rcritten commented Feb 9, 2021

All five executions of test_ipa_cert_fix.py failed with the dbus error and the re-organized code change.

@rcritten rcritten force-pushed the issue_8533 branch 9 times, most recently from 8f431a3 to e82d7ea Compare February 9, 2021 22:15
@rcritten
Copy link
Contributor Author

rcritten commented Feb 9, 2021

I think I may have it worked out but I've heavily redone the patch so we'll see. Originally I was trying to move the certmonger CA removal ahead of shutting down IPA. I've dropped that and went with the original code in cainstance.py but wrapped it with a loop.

dbus is socket activated and I think that sometimes it takes longer than certmonger is willing to wait to start up. So stick it into a loop and try up to 5 times.

In testing it never took more than 2 tries with no sleep.

@rcritten
Copy link
Contributor Author

What I observed in the last set of logs is the request to stop tracking failed with a DBus timeout but the request was actually successful. Tune up the logging around this so it is more clear what went on.

We would see a message like "Some certificates may still be tracked by certmonger." in the server uninstall log if there were remaining certificates. I haven't seen this in these dbus timeout failure cases.

@rcritten
Copy link
Contributor Author

For the reviewer:

There are two categories of errors here:

  1. Stop tracking the CA certificates in certmonger faisl with a DBus.Error.NoReply.

A handled failure looks like:

024-02-11T15:50:57Z DEBUG Attempt 1 to stop tracking for 'auditSigningCert cert-pki-ca in /etc/pki/pki-tomcat/alias'
2024-02-11T15:51:22Z DEBUG org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2024-02-11T15:51:22Z DEBUG Attempt 2 to stop tracking for 'auditSigningCert cert-pki-ca in /etc/pki/pki-tomcat/alias'
2024-02-11T15:51:43Z DEBUG org.fedorahosted.certmonger.no_such_entry: No matching entry found.
2024-02-11T15:51:43Z DEBUG Tracking no longer exists, stop trying
2024-02-11T15:51:43Z DEBUG Attempt 1 to stop tracking for 'ocspSigningCert cert-pki-ca in /etc/pki/pki-tomcat/alias'
2024-02-11T15:51:43Z DEBUG stop tracking successful

What you see here is the first attempt failing, it being caught, and a second attempt only to find out that the path doesn't exist. This means that the original request was successful but not communicated back.

A success on the first try looks like:

2021-02-10T16:02:37Z DEBUG Attempt 1 to stop tracking for /var/kerberos/krb5kdc/kdc.crt
2021-02-10T16:02:37Z DEBUG stop tracking successful

I tend to see this failure in test_integration-test_ipa_cert_fix.py-TestIpaCertFix-uninstall. You'll find the entries in the uninstall log.

The other error happens when trying to remove the IPA-created CA's in certmonger. Similarly we don't get a response back but the retry succeeds without a not found error.

A success looks like:
2024-02-11T15:51:45Z DEBUG Attempt 1 to remove certmonger CAs
2024-02-11T15:51:45Z DEBUG certmonger CA removal successful

A handled failure looks like:
2021-02-10T16:02:06Z DEBUG Attempt 1 to remove certmonger CAs
2021-02-10T16:02:31Z DEBUG org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
2021-02-10T16:02:31Z DEBUG Attempt 2 to remove certmonger CAs
2021-02-10T16:02:32Z DEBUG certmonger CA removal successful

I tend to see this most in test_integration-test_caless.py-TestReplicaCALessToCAFull-uninstall on the replica in the uninstall log.

@rcritten
Copy link
Contributor Author

I'm going to try to generalize this test loop. https://pagure.io/freeipa/issue/8470 shows this dbus error also on a fresh installation.

@rcritten rcritten force-pushed the issue_8533 branch 4 times, most recently from 4d4ba5e to 5d8e80f Compare February 10, 2021 22:23
@rcritten
Copy link
Contributor Author

So. Maybe it's been luck, but I made another change so that certmonger is not restarted during uninstall and that seems to have helped. I just ran a successful set with just that change and passed. I'll kick it again once Azure is done.

@rcritten rcritten force-pushed the issue_8533 branch 4 times, most recently from 716e42f to 1087633 Compare February 11, 2021 13:11
@rcritten rcritten added ipa-4-9 Mark for backport to ipa 4.9 needs review Pull Request is waiting for a review and removed WIP Work in progress - not ready yet for review labels Feb 11, 2021
@rcritten
Copy link
Contributor Author

Dropping WIP label, this seems to have fixed the uninstallation dbus-related issue.

@rcritten rcritten changed the title WIP: Re-order certmonger CA helper removal before services are down Don't stop certmonger during IPA uninstallation Feb 11, 2021
@abbra
Copy link
Contributor

abbra commented Feb 14, 2021

LGTM. Please remove temp commit.

@abbra abbra removed the needs review Pull Request is waiting for a review label Feb 15, 2021
This option was inconsistent between invocations and there is
no need to stop certmonger after stopping tracking. It was also
apparently causing dbus timeout errors, probably due to the amount
of work that certmonger does at startup.

https://pagure.io/freeipa/issue/8506
https://pagure.io/freeipa/issue/8533

Signed-off-by: Rob Crittenden <rcritten@redhat.com>
@rcritten
Copy link
Contributor Author

Dropped the temp commits.

@abbra
Copy link
Contributor

abbra commented Feb 15, 2021

ACK.

@abbra abbra added ack Pull Request approved, can be merged pushed Pull Request has already been pushed labels Feb 15, 2021
@abbra
Copy link
Contributor

abbra commented Feb 15, 2021

master:

  • 71047f6 Remove the option stop_certmonger from stop_tracking_*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ack Pull Request approved, can be merged ipa-4-9 Mark for backport to ipa 4.9 pushed Pull Request has already been pushed
Projects
None yet
4 participants