Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fabric is slow because stdout/stderr read byte per byte #715

Closed
octplane opened this issue Aug 23, 2012 · 10 comments
Closed

Fabric is slow because stdout/stderr read byte per byte #715

octplane opened this issue Aug 23, 2012 · 10 comments

Comments

@octplane
Copy link

Hi,

(Useless) introduction and other might-be bugs

Origin of the issue

I use fabric to build a maven artefact. When run in console, this build process takes around 30s and spits around 15000 lines of log. However, when using fabric it takes around _1m30s_ to complete the same process. Painfully slow.

I use it for all my deployment and control (in particular, I run chef-client inside fab) and I started wondering if using fabric was not making all the stuff slower than it was already... 😢

First missed attempt at fixing the issue

Being not very good at optimizing, I started suspecting the pure python ssh library to be a bit slow and wanted to thrown in a pure C implementation of ssh to see how it performed much better. I ended up trying libssh2 which has an incomplete binding for Python. However, it's complete enough to replace ssh for in the fabric code. Replacing ssh with pylibssh2 is almost straightforward although I feel there miss a "ssh" interface between fabric and its ssh library. I'm not sure having the opportunity to be able to replace the ssh backend simply would be good feature for fabric but It would have helped me testing faster.

My first tests indicate that fabric+libssh2 is as slow as fabric+ssh. This is when I suspected I was hunting for the wrong duck.

Actual Issue

The actual issue lies in the fact that fabric read its channel's stdout/stderr byte by byte. Changing line 60 of io.py from func(1) to func(1024) and ignoring all side effect issues boots the overall performance in my case.

I guess this should be rewritten so that:

  • It reads the outputs by larger block (1024 is probably a bit much, I'd go for 128 for example)
  • It's still compatible with ssh and in particular, handle EOF correctly
  • It still works well with the sudo prompt management (I feel this is where there is complexity !)
  • It still works with _prefix

Also:

  • if I'm not mistaken, the promptand try_again features are active on both stdout and stderr and I'm not sure this is actually needed in these two cases,
  • I'm very interested in refactoring the logging-out part of fabric to plug into the log factory of my choice (hence delegating the prefixing, stdout/stderr interleaving to some other class) because I want to write the logs directly to MongoDB.

Because this is something very annoying for me right now, I'm ready to spend some time to try and fix that (after all, I've spent 2 days a digging into pylibssh2 and libssh2 to understand how to use their public key feature properly). Note that I'll be doing that at work as my company (fotopedia) uses fabric in production.

I'm not a python expert, but provided you give me a heads up about the pitfalls I should be aware of, I'd love to help.

A so finishes this bug report. My thanks for fabric !

@bitprophet
Copy link
Member

Thanks a lot for all the details :)

First off, the bytewise behavior needs to remain the default for now, as it's what enables the interactivity feature (docs) and changing it would break backwards compatibility. (We might change this in 2.x, hard to say right now. It may well remain the default.

However I agree that allowing control over this would be useful for situations where users A) have lots of output and B) explicitly acknowledge that they don't need the ability to respond to prompts, etc inline (usually fine for non-newbie users who've already worked out the interactivity kinks in their recipes.)

If we can find a low-impact way to change this I'll merge it sometime soon; if the changes feel too drastic I will probably punt to 2.x. FYI.

What probably needs to happen:

  • Add a config option controlling this "number of bytes read" value, and have io.py leverage it in the read calls.
  • Update the rest of io.py to address the assumptions about working from a single byte -- the main problem here is detecting newlines, which is used both for handling --linewise and printing line prefixes.
  • Document this new config option
  • Add a note block to the docs (and probably an FAQ entry) highlighting that this option exists & to use it to speed up verbose and no-interactivity-needed situations.

@octplane
Copy link
Author

If I'm not mistaken, the interactivity feature is only active if self._capture is not None (as per https://github.com/fabric/fabric/blob/master/fabric/io.py#L69 ). If this is the case, then we can increase the number of byte reads when we know there will be no capture. This is a very simple fix IMHO that could still speed up non-interactive runs. It also doesn't require any additional configuration and documentation...

(edit: seems like I'm mistaken as capture is always used even if there is no interactive behaviour expected. How sad :/)

@octplane
Copy link
Author

Created a pull request that differ slightly from what we discussed. The diff however is quite small.

#716

@bitprophet
Copy link
Member

Hrm. I'm -0 on that approach. Capturing shouldn't be considered orthogonal to interactivity, even if it makes the code changes easier to handle (though your PR mentions that the line prompt remains to be fixed -- that's important too =/).

That + your earlier comment make it sound like you might not realize how important capturing is for most users -- it's a definitely a core use case for run/sudo, e.g. stdoutish_value = run('something that may or may not require interactivity').

Put another way, it's never OK to turn off capturing -- IMO this feature/change should be limited to a tradeoff between interactivity (small byte-read size) and speed (larger/configurable byte-read size). That probably will make it tougher to implement :(

Final, mostly unrelated note -- why the mutation into a class? I'm not a fan of doing that to existing code unless one strongly needs "classy" behavior, e.g. subclass+override or keeping a handle on frequently referenced state.

Sorry for being a negative nancy :) I appreciate you taking the time to try and hash this out!

@octplane
Copy link
Author

Hello Jeff,

Thanks for the feedback. As I'm about to go in holidays, this will have to wait until I'm back. I understand your concern and will try for propose a changeset that preserves the interactivity and capturability of the stream while reading it by chunk. I'm almost certain this can be achieved without too much headaches.

I've mutated the loop into a class for two reasons. First, I found it was difficult to understand the different states of the small state machine that's embedded inside without actually creating additional functions (or methods) encapsulating the prompt, try again, etc behaviour. Secondly, I actually captured some heavily used internal state (such as the _capture and the _stream objects.

Another important reason : This is definitely _the_ part of the code where lies the solution of this issue so I really need to be able to isolate all its moving parts so I can be sure it will still work after I've rewritten it...

(my co-worker told me that I'm actually the kind of guy who mutate script in class sometimes a bit too early, I might be the problem in this case :))

@bitprophet
Copy link
Member

That all sounds fair. Thanks again -- will wait to hear back. (I'm going on vaca for a few days as well, actually. Enjoy your own :))

@vdboor
Copy link

vdboor commented Aug 31, 2012

I'd love to know the results from this discussion! I also noticed that fabric can be slow, when a command generates a lot of output. I'd like to fetch a mysqldump from a remote server, and this takes:

  • 38 seconds with fabric's run to localhost
  • 0.2 seconds in a ssh command to localhost
  • 0.08 seconds on when running mysqldump directly at the shell

Interestingly enough, the following also worked really well, within 1 second:

print local("ssh -q -i {env.key_filename} {env.user}@{host} -T '{cmd}' ".format(env=env, cmd="your-command", capture=True)

@bitprophet
Copy link
Member

@vdboor Yea, it's not really a global "run within Fabric == SLOWMG", but rather that using the Python SSH library and/or the way we're buffering network traffic with it, is what's slow. Using local() is simply Python spinning up a new shell with your command in it -- it'll be exactly like running the ssh command in your bash shell, with really minor overhead at startup for the Python VM.

@octplane Eagerly awaiting your iteration on the pull request whenever you're back, & thanks again.

@octplane
Copy link
Author

Ohai,

Pull request is there: #733

@bitprophet
Copy link
Member

Merging into #733

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants