New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't stop certmonger during IPA uninstallation #5505
Conversation
It passed, re-running. |
Passed five times, re-running. |
@flo-renaud do you think there is a better test to exercise this re-ordering of the uninstall? |
@rcritten test_cert_fix and test_replica_promotion_TestHiddenReplicaPromotion are probably the jobs that fail the most frequently. But I think the right ticket is https://pagure.io/freeipa/issue/8506 for this issue (8533 happens during replica installation while configuring the cert tracking). |
Thanks for the suggestion, updating the test. I'll have to remember to go back to fix the ticket number. |
Both invocations of test_integration/test_replica_promotion.py::TestHiddenReplicaPromotion failed due to DNS issues. Re-running. |
Two more failures, unrelated to dbus. |
Try test_integration/test_caless.py::TestReplicaCALessToCAFull to try to provoke failure. If I can't maybe I'll drop the change to see if I can more easily reproduce this and then add the change back in to see if it actually did anything. |
Re-based and dropped change to see if one or more of the tests fail. |
temp_commit_2 reproduced the uninstall dbus error: [ipatests.pytest_ipa.integration.host.Host.master.cmd54] org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. |
All five executions of test_ipa_cert_fix.py failed with the dbus error and the re-organized code change. |
8f431a3
to
e82d7ea
Compare
I think I may have it worked out but I've heavily redone the patch so we'll see. Originally I was trying to move the certmonger CA removal ahead of shutting down IPA. I've dropped that and went with the original code in cainstance.py but wrapped it with a loop. dbus is socket activated and I think that sometimes it takes longer than certmonger is willing to wait to start up. So stick it into a loop and try up to 5 times. In testing it never took more than 2 tries with no sleep. |
What I observed in the last set of logs is the request to stop tracking failed with a DBus timeout but the request was actually successful. Tune up the logging around this so it is more clear what went on. We would see a message like "Some certificates may still be tracked by certmonger." in the server uninstall log if there were remaining certificates. I haven't seen this in these dbus timeout failure cases. |
For the reviewer: There are two categories of errors here:
A handled failure looks like: 024-02-11T15:50:57Z DEBUG Attempt 1 to stop tracking for 'auditSigningCert cert-pki-ca in /etc/pki/pki-tomcat/alias' What you see here is the first attempt failing, it being caught, and a second attempt only to find out that the path doesn't exist. This means that the original request was successful but not communicated back. A success on the first try looks like: 2021-02-10T16:02:37Z DEBUG Attempt 1 to stop tracking for /var/kerberos/krb5kdc/kdc.crt I tend to see this failure in test_integration-test_ipa_cert_fix.py-TestIpaCertFix-uninstall. You'll find the entries in the uninstall log. The other error happens when trying to remove the IPA-created CA's in certmonger. Similarly we don't get a response back but the retry succeeds without a not found error. A success looks like: A handled failure looks like: I tend to see this most in test_integration-test_caless.py-TestReplicaCALessToCAFull-uninstall on the replica in the uninstall log. |
I'm going to try to generalize this test loop. https://pagure.io/freeipa/issue/8470 shows this dbus error also on a fresh installation. |
4d4ba5e
to
5d8e80f
Compare
So. Maybe it's been luck, but I made another change so that certmonger is not restarted during uninstall and that seems to have helped. I just ran a successful set with just that change and passed. I'll kick it again once Azure is done. |
716e42f
to
1087633
Compare
Dropping WIP label, this seems to have fixed the uninstallation dbus-related issue. |
LGTM. Please remove temp commit. |
This option was inconsistent between invocations and there is no need to stop certmonger after stopping tracking. It was also apparently causing dbus timeout errors, probably due to the amount of work that certmonger does at startup. https://pagure.io/freeipa/issue/8506 https://pagure.io/freeipa/issue/8533 Signed-off-by: Rob Crittenden <rcritten@redhat.com>
Dropped the temp commits. |
ACK. |
master:
|
An effort prevent CI failures.
https://pagure.io/freeipa/issue/8533
Signed-off-by: Rob Crittenden rcritten@redhat.com