systemd timeout on cloud-init-final job #1

Open
asalkeld opened this Issue Jun 17, 2012 · 7 comments

Comments

Projects
None yet
4 participants
Contributor

asalkeld commented Jun 17, 2012

Depending on network speed and the complexity of the userdata script the cloud-init service is timing out.
Causing the job to be terminated and the instance to be in an undefined state.
Short term: set
TimeoutSec=
using sed?

Make a fedora bug so it can be fixed there (or upstream).

Contributor

tomassedovic commented Jun 22, 2012

Adding more info based on IRC discussion with @asalkeld:

On Fedora guests, the cloud-init script that executes what we pass via UserData is here:

http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/systemd/cloud-final.service

If the script takes too long (e.g. we're installing a lot of packages on a slow network) systemd kills it off. The default timeout is 90s.

So the quick fix would be to increase the timeout (Angus suggested 15 minutes) when building the jeos long-term try to get that into cloud-init upstream.

Right now, systemd ignores the TimeoutSec setting for oneshot services, though:

https://bugzilla.redhat.com/show_bug.cgi?id=761656

there's a fix but it's not been merged and released yet.

Contributor

tomassedovic commented Jun 22, 2012

I wanted to provide some details when I open the bug with cloud-init but I'm having trouble to reproduce this.

So I put the these lines to the Wordpress_Single_instance userdata:

"echo $(date) before sleep >> /tmp/thingy\n",
"sleep 500\n",          
"echo $(date) after sleep >> /tmp/thingy\n",

(I put this right below the hashbang, before the cfn-init call). If systemd is killing the process after 90 seconds, then the second message should not get written but it does.

My jeos was Fedora 16 64bit, cfntools.

@asalkeld can you see what I'm doing wrong? Does this timeout as well on your setup?

Contributor

asalkeld commented Jun 23, 2012

Hi Tomas, I'll test your script and let you know.

imain commented Jun 26, 2012

I get the same thing:

Jun 26 00:36:45 localhost systemd[1]: cloud-final.service operation timed out. Terminating.
Jun 26 00:36:45 localhost systemd[1]: Unit cloud-final.service entered failed state.

can we not add a timeout to the systemd config for cloud-final.service?

imain commented Jun 26, 2012

This happens for me in both F16 and F17. Putting TimeoutSec=0 in the systemd config fixes it in both.

sdake added a commit that referenced this issue Jun 26, 2012

Set TimeoutSec to zero in fedora systemd files (don't timeout)
Fixes issue #1

Signed-off-by: Steven Dake <sdake@redhat.com>

@ghost ghost assigned sdake Jun 26, 2012

Owner

sdake commented Jun 26, 2012

Tomas,

Can you close with the fedora maintainer of cloud-init to make a permanent solution for the packages in f16/f17?

Thanks
-steve

Contributor

tomassedovic commented Jun 28, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment