Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perl-CAF update should trig a ncm-cdispd restart #89

Open
jouvin opened this issue May 7, 2015 · 8 comments
Open

perl-CAF update should trig a ncm-cdispd restart #89

jouvin opened this issue May 7, 2015 · 8 comments

Comments

@jouvin
Copy link
Contributor

jouvin commented May 7, 2015

During deployment of 15.4.0-rc1 (quattor/release#85), I was (again) in a situation where It basically works but ncm-ncd fails during the update itself. Below are the last lines from ncm-cdispd corresponding to the upgrade:

2015/05/07-17:34:04 [WARN] ncm-ncd finished with status: 255 (ec 65280, some configuration modules failed to run successfully)
2015/05/07-17:34:46 [WARN] Component ccm failed with message: cannot execute configure on component ccm
2015/05/07-17:34:46 [WARN] Component network failed with message: cannot execute configure on component network
2015/05/07-17:34:46 [WARN] ncm-ncd finished with status: 255 (ec 65280, some configuration modules failed to run successfully)
2015/05/07-17:34:46 [INFO] Processing delayed signal TERM
2015/05/07-17:34:46 [WARN] signal handler: signal TERM received
2015/05/07-17:34:46 [WARN] terminating ncm-cdispd...

Doing aonther deploy or restarting ncm-cdispd, the next run is ok:

2015/05/07-17:45:04 [INFO] new profile arrived, examining...
updated /var/lib/ccm/profile.414/ccm-active-profile.414-2520
2015/05/07-17:45:04 [INFO] new profile identical but re-running ncm-ncd since last execution reported errors
2015/05/07-17:45:04 [INFO] about to run: /usr/sbin/ncm-ncd --configure profile grub network ccm vomsclient hostsaccess sysconfig altlogrotate aiiserver named modprobe dirperm spma dpmlfc cdp ldconf accounts filecopy xrootd mkgridmap iptables gip2 mysql cron etcservices nrpe chkconfig gridmapdir ntpd useraccess sudo lcgbdii --state /var/run/quattor-components
2015/05/07-17:45:43 [INFO] ncm-ncd finished with status: 0 (ec 0, all configuration modules ran successfully)

This is not a blocking issue as the next deployment after the update completes successfully. But I think that this happens when perl-CAF is updated if ncm-cdispd is not restarted because ncm-ncd tries to use a CAF version removed since it has started and this is causing some side effects. I'd suggest adding a post install script restarting ncm-cdispd as part of the perl-CAF upgrade.

@stdweird
Copy link
Member

stdweird commented May 7, 2015

what is sending the TERM signal?
and what do mean with because ncm-ncd tries to use a CAF version removed since it has started?

i'll see if i can reproduce this. why would ncm-cdispd want to reopen the CAF files? it should have already parsed them and have the code in memory? (or we should try to force to keep the the CAF filehandles open somehow; but i don't know how perl access modules during runtime)

@jouvin
Copy link
Contributor Author

jouvin commented May 7, 2015

The TERM signal is sent by post install script of ncm-cdispd and ncm-ncd (at least) and probably ccm. The problem is that ncm-cdispd completes what it is doing before really shutting down (deferred signal processing) but the new ncm-cdispd, and thus the new ncm-ncd, is starting immediately (before the previous one has shutdown). This is harmless because of ncm-ncd lock that will prevent it to run components until the previous one has finished. But that means that ncm-ncd may start with a version of CAF not yet updated (depending in which order YUM updates things). The reason this causes a problem is not really clear for me, except if ncm-ncd itself is requiring the CAF changes.
Just adding the proposed restart will ensure that during the upgrade, ncm-ncd will be started at least once by ncm-cdispd with the updated version of CAF(as the restart occurs during the post install script). Without this, we sometimes need one more deploy (as in the example I gave).

@jouvin
Copy link
Contributor Author

jouvin commented May 7, 2015

For me, this problem explains the issue we were not able to reproduce opened against ncm-ncd: quattor/ncm-ncd#29.

@stdweird
Copy link
Member

stdweird commented May 7, 2015

what if we can take the lock in the BEGIN of main in ncm-ncd? that way only minimal CAF code is loaded (eg not even NCD::ComponentProxyList is read); and i think BEGIN implies that the rest is not even read/loaded.

(the usage of NCD::ComponentProxyList before the the lock is checked as the code does currently can be achieve if list option implies ignorelock by default)

@jouvin
Copy link
Contributor Author

jouvin commented May 7, 2015

I agree that this is something to review. Anyway, I still think that adding the (conditional) restart during RPM installation may help to catch the corner cases...

@jouvin jouvin added this to the 15.4 milestone May 7, 2015
@jouvin
Copy link
Contributor Author

jouvin commented May 7, 2015

I set the milestone for 15.4 but I have no problem if we decide postponing to 15.6...

@stdweird
Copy link
Member

stdweird commented May 8, 2015

i don't like to add this to, it needs proper fix imho.
for this reason we have chkconfig as postdep of spma (but then chkconfig doesn't do what we want 😄 )

anyway, as long as it's not done like the ccm postinstall script (that one uses pgrep, which is not included as a rpm requirement)

@jouvin
Copy link
Contributor Author

jouvin commented May 8, 2015

It's true that ncm-cdispd startup file doesn't support cond_restart which is a problem...

I opened an issue for CCM (quattor/CCM#53) as clearly the use of pgrep is inappropriate (but also a workaround for the absence of cond_restart.

@jouvin jouvin modified the milestones: 15.6, 15.4 May 8, 2015
@jrha jrha modified the milestones: 15.10, 15.8 Aug 20, 2015
@jrha jrha modified the milestones: 16.4, 15.12 Nov 26, 2015
@jrha jrha removed this from the 16.4 milestone Dec 12, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants