Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apache2 service only_if guard timeout exceeded: default.rb::line 36 #238

Closed
greenreign opened this issue Aug 15, 2014 · 23 comments
Closed

Comments

@greenreign
Copy link

Script timed out running httpd -t

 Mixlib::ShellOut::CommandTimeout
           --------------------------------
           Command timed out after 2s:
           Command execeded allowed execution time, process terminated
           ---- Begin output of /usr/sbin/httpd -t ----
           STDOUT:
           STDERR:
           ---- End output of /usr/sbin/httpd -t ----
           Ran /usr/sbin/httpd -t returned

When I run the script from the box it returns
Syntax OK
But it always takes about 5 seconds. Can you make the timeout on the only_if guard longer or configurable? see default.rb::line 36

@svanzoest
Copy link
Contributor

Hi @greenreign,
Thank you for your report. Can you provide a bit more input on your environment? What operating system are you on, what modules and number of vhosts are loaded? I am curious as to what would cause a syntax check to take so long to execute.

@greenreign
Copy link
Author

[vagrant@default-centos-64 ~]$ cat /etc/centos-release
CentOS release 6.4 (Final)

[vagrant@default-centos-64 httpd]$ sudo httpd -S
VirtualHost configuration:
[public_ip]:80        [Host_Name](/etc/httpd/sites-enabled/ci.conf:1)
wildcard NameVirtualHosts and _default_ servers:
*:80                   is a NameVirtualHost
         default server [Host_Name] (/etc/httpd/sites-enabled/public.conf:1)
         port 80 namevhost [Host_Name] (/etc/httpd/sites-enabled/public.conf:1)
Syntax OK

[vagrant@default-centos-64 httpd]$ ls mods-enabled/
alias.conf       authz_default.load    autoindex.conf  dir.conf      log_config.load  negotiation.conf  proxy.load     status.conf
alias.load       authz_groupfile.load  autoindex.load  dir.load      logio.load       negotiation.load  rewrite.load   status.load
auth_basic.load  authz_host.load       deflate.conf    env.load      mime.conf        proxy.conf        setenvif.conf
authn_file.load  authz_user.load       deflate.load    headers.load  mime.load        proxy_http.load   setenvif.load
[vagrant@default-centos-64 httpd]$ ls mods-available/
alias.conf       authz_default.load    autoindex.conf  dir.conf      log_config.load  negotiation.conf  proxy.load     status.conf
alias.load       authz_groupfile.load  autoindex.load  dir.load      logio.load       negotiation.load  rewrite.load   status.load
auth_basic.load  authz_host.load       deflate.conf    env.load      mime.conf        proxy.conf        setenvif.conf
authn_file.load  authz_user.load       deflate.load    headers.load  mime.load        proxy_http.load   setenvif.load

@greenreign
Copy link
Author

[public_ip] and [Host_Name]'s are a valid public IP and hostname.

@greenreign
Copy link
Author

It's a basic default run of the recipe other than the virtual hosts and adding mod_proxy and mod_proxy_http. I'll admit I'm messing around with the virtual hosts and I don't understand them that well.

@svanzoest
Copy link
Contributor

thanks. my gut feeling says that the delay is related to the proxy setup, but I haven't written any tests for that yet.

@drpebcak drpebcak added the bug label Aug 15, 2014
@greenreign
Copy link
Author

See a glaring issue here?

<VirtualHost sub.example.com:80 >
  ServerName sub.example.com
  <Proxy *>
    Order allow,deny
    Allow from all
  </Proxy>
  ProxyPass / http://localhost:8080/
  ProxyPassReverse / http://localhost:8080/
</VirtualHost>

svanzoest pushed a commit that referenced this issue Aug 19, 2014
@svanzoest
Copy link
Contributor

Did you find out any more why it takes so long to do a config test?

@greenreign
Copy link
Author

Thank you. I didn't find out what was causing the slow response.

To add to the details: I did not have a problem when running on Amazon Linux from AWS. It was only too slow when running on my local Centos Vagrant virt.
I was able to get around it when I changed the vhost entry from to <VirtualHost *:80>
Perhaps it was DNS lookup?

@podwhitehawk
Copy link

Code above will fix that issue.
Slow response is caused by big amount of config files to test with "httpd -t" in conjunction with slow underlying storage system.
As an example you can try a slow 5400RPM HDD typically located in notebooks. Try to copy a lot of small files (I've tested it by duplicating RPM packages located in CentOS 6.5 DVD) and converging an apache2 cookbook at the same time.

@svanzoest
Copy link
Contributor

@podwhitehawk removing the timeout means it may spin for a significant amount of time and possibly pile up with no recourse on productions. Are you saying that 10 seconds is not enough?

@podwhitehawk
Copy link

@svanzoest I think that timeouting check operation is a bad idea. You have encountered that already with 2 seconds. And will encounter with 10 seconds again sooner or later.
So cookbook should check exit status and not terminate itself with timeout.
P.S. any linux command is very stable, so it should never spin forever.

@svanzoest
Copy link
Contributor

@podwhitehawk it is more related to performance and having the chef run halt, causing later recipes in the run_list to not run. There is no negative in just moving on and trying again at the next convergence.
Also, keep in mind that this cookbook supports other platforms that are not linux based.

@podwhitehawk
Copy link

@svanzoest it will never halt, it's not time dependent at all. So it will end convergence minute or two later and it's not deathly.
It's more important to have accurate result instead of failing cookbook. Do you agree?

@svanzoest
Copy link
Contributor

@podwhitehawk I agree. Just need to create a test case to ensure this behavior.

@podwhitehawk
Copy link

@svanzoest I've already tried that piece of code with slow notebook drive like I've described before.
And another case - I've tried is inserting sleep like "sleep 300; httpd -t" and I can't get that process failing.

@svanzoest
Copy link
Contributor

@podwhitehawk we should add it as a serverspec test.

@podwhitehawk
Copy link

@svanzoest I'm trying to undrestand how to check it, but no luck.
any suggestions?

@svanzoest
Copy link
Contributor

@podwhitehawk I would use test kitchen and update the serverspec tests in test/integration/default/serverspec to test if the chef run actually completes in a negative test.

@svanzoest
Copy link
Contributor

The more I think about this is that what happens when the test never completes? Do we really have a case where it does not complete in 10 seconds? Ultimately there needs to be a time out somewhere. I do not really have the time to test this out, so feel free to reopen this if someone has an example and we can actually confirm what the behavior is when the test never completes.

@docwhat
Copy link

docwhat commented May 8, 2015

I'm seeing two things that can mitigate this problem. Doing both would be ideal.

One, httpd -t runs on every chef-run. Ideally, 'httpd -t' should only be run when configurations change, not every run. For example, for our in-house sshd configuration management, we create the config file(s) in the cache and run sshd -T (which is like httpd -t) on those cache files. If it passes, we copy over the new files using normal chef resources (e.g. so it won't do it if they didn't change). Iff the copy makes a change then it triggers a restart.

There are a lot of apache config files and it would be tricky to exactly the same trick with apache, but maybe something else could be done, just as only running httpd -t when start and graceful are triggered, not for all the actions on the service.

Two, httpd -t should never "just hang" (not including things like filesystem corruption, or the kernel crashing). If it is taking a long time it is because the disk IO is slow, the VM is being moved across the country live, the system is really busy, or maybe swap is thrashing then even 10 seconds is too short. The point being, in that case, chef should continue to do its job, even if it takes a while. It is up to the monitoring software to alert someone that the system is very busy. If the next Chef run fails because chef is still running from the first time, someone will get notified then.

I would suggest if you must put a timeout, set it really high, like 3 minutes. That way it acts as an absolutely last resort measure.

@sergio-bobillier
Copy link

I'm experiencing this issue when I run test kitchen in a Vagrant - Virtual Box instance. In my test environment Apache has quite a lot of Virtual Hosts configured (around 360). When running the tests in an AWS instance they pass w/o issue but my local machine and thus the Virtual Box VM is not fas enough and the converge process fails.

/usr/sbin/httpd -t is taking 12 seconds to execute instead of just 10

Can you make this timeout configurable? I would like to set it to 20 seconds on my local machine but keep it as 10 seconds on the AWS instances or when the recipe is run in an actual server.

@araj1
Copy link

araj1 commented Aug 20, 2017

The issue still exists, is there anyway we can increase the default timeout ?

@lock
Copy link

lock bot commented Aug 20, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants