New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NF: Allow setting of timeout for execution of action commands #190

Closed
wants to merge 2 commits into
base: 0.9
from

Conversation

Projects
None yet
3 participants
@kwirk
Contributor

kwirk commented Apr 27, 2013

This uses subprocess.Popen, polling until timeout seconds has passed
or the command has exit. If the command has not exited, fail2ban then
sends SIGTERM, and if this is unsuccessful, SIGKILL.

The timeout can be changed for an entire action via action [Init]
options, or via jail.conf override, or fail2ban-client. The default
timeout period is 60 seconds.

Reference #60

NF: Allow setting of timeout for execution of action commands
This uses subprocess.Popen, polling until `timeout` seconds has passed
or the command has exit. If the command has not exited, fail2ban then
sends SIGTERM, and if this is unsuccessful, SIGKILL.

The timeout can be changed for an entire action via action [Init]
options, or via jail.conf override, or fail2ban-client. The default
timeout period is 60 seconds.
@coveralls

This comment has been minimized.

coveralls commented Apr 27, 2013

Coverage Status

Coverage remained the same when pulling d07df66 on kwirk:cmd-timeout into bddbf1e on fail2ban:0.9.

if msg:
logSys.info("HINT on %x: %s"
% (retcode, msg % locals()))
popen = subprocess.Popen(realCmd, shell=True)

This comment has been minimized.

@yarikoptic

yarikoptic Apr 28, 2013

Member

If you are at it -- shouldn't we "swallow" all the stdout/stderr and then (may be) spit it out as a part of a debug message (if any gets received)?

This comment has been minimized.

@kwirk

kwirk Apr 28, 2013

Contributor

@yarikoptic Good idea. Looking at the python docs, it appears that the communicate blocks, but calling poll can have the issue with the processes blocking due to IO buffer being full if stdout and/or stderr is being captured, forcing use of communicate.
Therefore, how about only capturing stdout and stderr if debug logging is enabled and then only calling communicate if retcode is non-zero?

This comment has been minimized.

@yarikoptic

yarikoptic Apr 29, 2013

Member

wouldn't calling communicate only if retcode is non-zero would be "too late"? i.e. we first should have to have Popen redirecting to pipes. also those would require additional threads while communicate'ing -- and we have already so many ;)

I wondered -- may be it would be worth to create/use a temporary file which would be provided as stdout/stderr and thus collect all the output... that would prevent anything dumped to the screen, and it could then be read back in case of 'debug' level and failure... somewhat ugly but should avoid any additional threading, buffers locking , etc... and since we lock for execution of the command -- should be thread safe to re-use a single file for that purpose... what do you think?

This comment has been minimized.

@kwirk

kwirk Apr 29, 2013

Contributor

Sounds good. I've created two version, one with global utilising the locks, and one just creating local variable. I prefer local variables as I think this is cleaner, and the only notable creation/deletion overhead of IO is only during start up or shutdown of fail2ban. Luckily, the tempfile.TemporaryFile1 handles all the deletion once the file is closed 😄

https://gist.github.com/kwirk/5484839
https://gist.github.com/kwirk/5484915

What do you think?

This comment has been minimized.

@kwirk

kwirk Apr 29, 2013

Contributor

Should probably add that my concern with the global file, is if a command forks, or isn't killed, it might keep hold of the file descriptor, and then end up in pickle, hence creating new files for every execution.

This comment has been minimized.

@yarikoptic

yarikoptic Apr 30, 2013

Member

yeah =-- good point about global file solution

as for local one -- why even bother to read it if no need (not debug and retcode is 0)?

btw --

"%s" % (`bu`,)
"%r" % (bu, )

should be identical ;)

This comment has been minimized.

@yarikoptic

yarikoptic Apr 30, 2013

Member

"the only notable creation/deletion overhead of IO is only during start up or shutdown of fail2ban"
I am not sure on that though -- we also have those actioncheck commands etc

Now I wonder (just thinking out loud) -- may be this swallowing/redirection should indeed be optional -- i.e. an option to the action (and/or if debug level), and otherwise just keep dumping everything to stdout as it did before (stdout=None, ...)?

eventually someone needs to start crafting a benchmark battery.
May be using https://github.com/pydata/vbench or do you know something more established ?
I could find a host where to exercise it "consistently"

This comment has been minimized.

@kwirk

kwirk Apr 30, 2013

Contributor

Good call about reading the output being pointless if it's not going to be logged.

I did a little test using

python2 -m timeit -s "from fail2ban.server import action"  'action.Action.executeCmd("ls -la")'

and found that with or without the changes, time took an average 103ms (note that there is at least one 100ms sleep in there, I think that hints the overhead is pretty low 😄). I suspect that any overhead will be outweighed by the actual command being executed?

The vbench stuff looks interesting. I've never looked at benchmark stuff before. I wonder, is performance that critical for something like fail2ban?

This comment has been minimized.

@yarikoptic

yarikoptic May 1, 2013

Member

performance -- depends on the deployment. For me -- primarily not. But we had already complaints where users had to come up with ad-hoc setups to run multiple fail2bans on the same box to be able to avoid GIL and still be capable to parse all the log files. sure thing those are not directly related to performance of actually running the commands, but more of parsing, but better to just keep performance in mind.

As for vbench -- it might be fun to see either we impaired parsing performance by e.g. introducing more of time formats etc.

This comment has been minimized.

@kwirk

kwirk May 1, 2013

Contributor

Hmm…I suppose the ideal would be for each jail to be it's own process? I suspect this would be very trciky without making use of the python multiprocessing library, which unfortunately requires python 2.6+. Maybe this is a feature for v1.0 😉

if retcode is None:
logSys.error("%s timed out after %i seconds." %
(realCmd, timeout))
os.kill(popen.pid, signal.SIGTERM) # Terminate the process

This comment has been minimized.

@yarikoptic

yarikoptic Apr 28, 2013

Member

it might get useful to delay error msg until you know how it gets killed and add that information (TERM or KILL signal) to the debug msg

logSys.error("Unable to kill PID %i: %s" % (popen.pid, realCmd))
elif retcode < 0:
logSys.error("%s killed with %s" %
(realCmd, signame.get(-retcode, "signal %i" % -retcode)))

This comment has been minimized.

@yarikoptic

yarikoptic Apr 28, 2013

Member

ah -- cool -- so signal gets reported here. I guess disregard above comment ;)

@kwirk

This comment has been minimized.

Contributor

kwirk commented May 1, 2013

@yarikoptic I've added the changes for the capture of stdout and stderr. Not sure what we agreed on performance side. If you want, I can change if to only capture if the level is debug.

yarikoptic added a commit to yarikoptic/fail2ban that referenced this pull request May 3, 2013

Merge branch 'cmd-timeout' (PR fail2ban#190) of https://github.com/kw…
…irk/fail2ban into 0.9

* 'cmd-timeout' of https://github.com/kwirk/fail2ban:
  NF: For action execution, log stdout and stderr if debug or cmd error
  NF: Allow setting of timeout for execution of action commands

Conflicts:
	fail2ban/client/actionreader.py
	fail2ban/server/action.py
@yarikoptic

This comment has been minimized.

Member

yarikoptic commented May 3, 2013

That is strange -- PR did not get closed although I merged manually (minor conflicts) and pushed... so closing manually

@kwirk

This comment has been minimized.

Contributor

kwirk commented May 3, 2013

@yarikoptic Thanks 😄 I think this didn't get closed as you expected by your commit, as currently it's only been merged into your own fork.

@kwirk kwirk deleted the kwirk:cmd-timeout branch Mar 15, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment