Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA-less container setup fails to upgrade after new docker image pull #276

Open
jdel opened this issue Jul 31, 2019 · 3 comments

Comments

@jdel
Copy link

commented Jul 31, 2019

I have a couple of FreeIPA replicas running in containers and using local persistant disk for bind mounting /data.

The servers are CoreOS stable, and are treated as ephemeral, they can be completely gone and respawned with the same disk and ignition configuration.

This strategy has been working out so far with a variety of containers but I am facing issues with FreeIPA.

The initial master is setup with:

--domain=my.domain.com
--realm=MY.DOMAIN.COM
--ds-password=REDACTED
--admin-password=REDACTED
--unattended
--no-pkinit
--http-cert-file=/data/star.my.domain.com.crt
--dirsrv-cert-file=/data/star.my.domain.com.crt
--dirsrv-pin=
--http-pin=
--no-ntp
--no-sshd
--no-ssh

The replicas with:

--server=ldap2.my.domain.com
--domain=my.domain.com
--realm=MY.DOMAIN.COM
--unattended
--principal=register-user
--admin-password=REDACTED
--no-pkinit
--http-cert-file=/data/star.my.domain.com.crt
--dirsrv-cert-file=/data/star.my.domain.com.crt
--dirsrv-pin=
--http-pin=
--no-ntp
--no-sshd
--no-ssh
--force-join

I only wish to use LDAP and the web ui with a wildcard cert bought online.
Everything works as intended until a new freeipa/freeipa-server:fedora-29 is pulled and the container restarted.

The upgrade process kicks in and fails with:

ipa-server-configure-first.log

Wed Jul 31 14:17:18 UTC 2019 /usr/local/sbin/init
Wed Jul 31 14:17:18 UTC 2019 /usr/sbin/ipa-server-configure-first upgrade
/data/build-id /data-template/build-id differ: byte 1, line 1
FreeIPA server is already configured but with different version, starting upgrade.
sed: can't read /etc/sysconfig/pki-tomcat: No such file or directory

From the code here I am still wondering why sed triggers as my /usr/share directory only contains a directory called ipa.

Touching the missing file enables the upgrade process to continue.

So the upgrade kicks in and I can see a fee upgrade complete messages before it all ends abruptly with:

[Verifying that KDC configuration is using ipa-kdb backend]
CalledProcessError: CalledProcessError(Command ['/bin/systemctl', 'start', 'certmonger.service'] returned non-zero exit status 1: 'Job for certmonger.service failed because a timeout was exceeded.\nSee "systemctl status certmonger.service" and "journalctl -xe" for details.\n')

ipaupgrade.log

2019-07-31T14:35:30Z DEBUG stderr=
2019-07-31T14:35:30Z DEBUG Starting external process
2019-07-31T14:35:30Z DEBUG args=['/bin/systemctl', 'start', 'certmonger.service']

2019-07-31T14:38:30Z DEBUG Process finished, return code=1
2019-07-31T14:38:30Z DEBUG stdout=
2019-07-31T14:38:30Z DEBUG stderr=Job for certmonger.service failed because a timeout was exceeded.
See "systemctl status certmonger.service" and "journalctl -xe" for details.

2019-07-31T14:38:30Z ERROR IPA server upgrade failed: Inspect /var/log/ipaupgrade.log and run command ipa-server-upgrade manually.
2019-07-31T14:38:30Z DEBUG   File "/usr/lib/python3.7/site-packages/ipapython/admintool.py", line 179, in execute
    return_value = self.run()
  File "/usr/lib/python3.7/site-packages/ipaserver/install/ipa_server_upgrade.py", line 54, in run
    server.upgrade()
  File "/usr/lib/python3.7/site-packages/ipaserver/install/server/upgrade.py", line 2153, in upgrade
    upgrade_configuration()
  File "/usr/lib/python3.7/site-packages/ipaserver/install/server/upgrade.py", line 1898, in upgrade_configuration
    http.configure_certmonger_renewal_guard()
  File "/usr/lib/python3.7/site-packages/ipaserver/install/httpinstance.py", line 293, in configure_certmonger_renewal_guard
    certmonger.start()
  File "/usr/lib/python3.7/site-packages/ipaplatform/base/services.py", line 302, in start
    skip_output=not capture_output)
  File "/usr/lib/python3.7/site-packages/ipapython/ipautil.py", line 574, in run
    p.returncode, arg_string, output_log, error_log

2019-07-31T14:38:31Z DEBUG The ipa-server-upgrade command failed, exception: CalledProcessError: CalledProcessError(Command ['/bin/systemctl', 'start', 'certmonger.service'] returned non-zero exit status 1: 'Job for certmonger.service failed because a timeout was exceeded.\nSee "systemctl status certmonger.service" and "journalctl -xe" for details.\n')
2019-07-31T14:38:31Z ERROR Unexpected error - see /var/log/ipaupgrade.log for details:
CalledProcessError: CalledProcessError(Command ['/bin/systemctl', 'start', 'certmonger.service'] returned non-zero exit status 1: 'Job for certmonger.service failed because a timeout was exceeded.\nSee "systemctl status certmonger.service" and "journalctl -xe" for details.\n')
2019-07-31T14:38:31Z ERROR The ipa-server-upgrade command failed. See /var/log/ipaupgrade.log for more information

The certmonger journal contains the following:

ldap2.my.domain.com systemd[1]: Starting Certificate monitoring and PKI enrollment...
ldap2.my.domain.com bash[564]: Error org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
ldap2.my.domain.com bash[564]: Error org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
ldap2.my.domain.com bash[564]: Error org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
ldap2.my.domain.com systemd[1]: certmonger.service: Start-post operation timed out. Stopping.
ldap2.my.domain.com systemd[1]: certmonger.service: State 'stop-sigterm' timed out. Killing.
ldap2.my.domain.com systemd[1]: certmonger.service: Killing process 561 (certmonger) with signal SIGKILL.
ldap2.my.domain.com systemd[1]: certmonger.service: Killing process 604 (certmonger) with signal SIGKILL.
ldap2.my.domain.com systemd[1]: certmonger.service: Main process exited, code=killed, status=9/KILL
ldap2.my.domain.com systemd[1]: certmonger.service: Killing process 604 (certmonger) with signal SIGKILL.
ldap2.my.domain.com systemd[1]: certmonger.service: Failed with result 'timeout'.
ldap2.my.domain.com systemd[1]: Failed to start Certificate monitoring and PKI enrollment.

I also have to mention I am concerned about the tagging of the docker images for use outside of a test environment as already mentioned in #246, but I understand the difficulty you are facing.

The only way I can get a replica to work again after a docker image update is to delete the affected replica IPA from another standing master, delete the volume on disk and restart the replica from scratch, which is far from ideal. as it involves manual steps and does not appear very robust.

Could you assist with the troubleshooting and advise on potential solutions ?

Thanks in advance

@jdel

This comment has been minimized.

Copy link
Author

commented Jul 31, 2019

After further investigation, the upgrade succeeds with the freeipa/freeipa-server:fedora-30 docker image, which leads me to believe the fedora-29 tag has some sort of issue.

I tried to start a brand new master with fedora-29 and the flags provided above and it fails with the same certmonger issue.

@adelton

This comment has been minimized.

Copy link
Collaborator

commented Aug 8, 2019

We've seen that certmonger problem for quite some time. It's been tracked as https://bugzilla.redhat.com/show_bug.cgi?id=1656519.

@jdel

This comment has been minimized.

Copy link
Author

commented Aug 12, 2019

Is there a known workaround when this issue happens ? My only option was to delete the replica completely and recreate it from scratch.

Also, what is the role of certmonger in the case of a CA-less install like mine ? I already provide a full chain pkcs12 to freeipa.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.