Fabric is slow because stdout/stderr read byte per byte #715

octplane · 2012-08-23T17:37:14Z

Hi,

(Useless) introduction and other might-be bugs

Origin of the issue

I use fabric to build a maven artefact. When run in console, this build process takes around 30s and spits around 15000 lines of log. However, when using fabric it takes around _1m30s_ to complete the same process. Painfully slow.

I use it for all my deployment and control (in particular, I run chef-client inside fab) and I started wondering if using fabric was not making all the stuff slower than it was already... 😢

First missed attempt at fixing the issue

Being not very good at optimizing, I started suspecting the pure python ssh library to be a bit slow and wanted to thrown in a pure C implementation of ssh to see how it performed much better. I ended up trying libssh2 which has an incomplete binding for Python. However, it's complete enough to replace ssh for in the fabric code. Replacing ssh with pylibssh2 is almost straightforward although I feel there miss a "ssh" interface between fabric and its ssh library. I'm not sure having the opportunity to be able to replace the ssh backend simply would be good feature for fabric but It would have helped me testing faster.

My first tests indicate that fabric+libssh2 is as slow as fabric+ssh. This is when I suspected I was hunting for the wrong duck.

Actual Issue

The actual issue lies in the fact that fabric read its channel's stdout/stderr byte by byte. Changing line 60 of io.py from func(1) to func(1024) and ignoring all side effect issues boots the overall performance in my case.

I guess this should be rewritten so that:

It reads the outputs by larger block (1024 is probably a bit much, I'd go for 128 for example)
It's still compatible with ssh and in particular, handle EOF correctly
It still works well with the sudo prompt management (I feel this is where there is complexity !)
It still works with _prefix

Also:

if I'm not mistaken, the promptand try_again features are active on both stdout and stderr and I'm not sure this is actually needed in these two cases,
I'm very interested in refactoring the logging-out part of fabric to plug into the log factory of my choice (hence delegating the prefixing, stdout/stderr interleaving to some other class) because I want to write the logs directly to MongoDB.

Because this is something very annoying for me right now, I'm ready to spend some time to try and fix that (after all, I've spent 2 days a digging into pylibssh2 and libssh2 to understand how to use their public key feature properly). Note that I'll be doing that at work as my company (fotopedia) uses fabric in production.

I'm not a python expert, but provided you give me a heads up about the pitfalls I should be aware of, I'd love to help.

A so finishes this bug report. My thanks for fabric !

The text was updated successfully, but these errors were encountered:

bitprophet · 2012-08-23T20:22:16Z

Thanks a lot for all the details :)

First off, the bytewise behavior needs to remain the default for now, as it's what enables the interactivity feature (docs) and changing it would break backwards compatibility. (We might change this in 2.x, hard to say right now. It may well remain the default.

However I agree that allowing control over this would be useful for situations where users A) have lots of output and B) explicitly acknowledge that they don't need the ability to respond to prompts, etc inline (usually fine for non-newbie users who've already worked out the interactivity kinks in their recipes.)

If we can find a low-impact way to change this I'll merge it sometime soon; if the changes feel too drastic I will probably punt to 2.x. FYI.

What probably needs to happen:

Add a config option controlling this "number of bytes read" value, and have io.py leverage it in the read calls.
Update the rest of io.py to address the assumptions about working from a single byte -- the main problem here is detecting newlines, which is used both for handling --linewise and printing line prefixes.
Document this new config option
Add a note block to the docs (and probably an FAQ entry) highlighting that this option exists & to use it to speed up verbose and no-interactivity-needed situations.

octplane · 2012-08-24T10:55:10Z

If I'm not mistaken, the interactivity feature is only active if self._capture is not None (as per https://github.com/fabric/fabric/blob/master/fabric/io.py#L69 ). If this is the case, then we can increase the number of byte reads when we know there will be no capture. This is a very simple fix IMHO that could still speed up non-interactive runs. It also doesn't require any additional configuration and documentation...

(edit: seems like I'm mistaken as capture is always used even if there is no interactive behaviour expected. How sad :/)

octplane · 2012-08-24T12:36:28Z

Created a pull request that differ slightly from what we discussed. The diff however is quite small.

#716

bitprophet · 2012-08-27T18:23:39Z

Hrm. I'm -0 on that approach. Capturing shouldn't be considered orthogonal to interactivity, even if it makes the code changes easier to handle (though your PR mentions that the line prompt remains to be fixed -- that's important too =/).

That + your earlier comment make it sound like you might not realize how important capturing is for most users -- it's a definitely a core use case for run/sudo, e.g. stdoutish_value = run('something that may or may not require interactivity').

Put another way, it's never OK to turn off capturing -- IMO this feature/change should be limited to a tradeoff between interactivity (small byte-read size) and speed (larger/configurable byte-read size). That probably will make it tougher to implement :(

Final, mostly unrelated note -- why the mutation into a class? I'm not a fan of doing that to existing code unless one strongly needs "classy" behavior, e.g. subclass+override or keeping a handle on frequently referenced state.

Sorry for being a negative nancy :) I appreciate you taking the time to try and hash this out!

octplane · 2012-08-29T15:39:05Z

Hello Jeff,

Thanks for the feedback. As I'm about to go in holidays, this will have to wait until I'm back. I understand your concern and will try for propose a changeset that preserves the interactivity and capturability of the stream while reading it by chunk. I'm almost certain this can be achieved without too much headaches.

I've mutated the loop into a class for two reasons. First, I found it was difficult to understand the different states of the small state machine that's embedded inside without actually creating additional functions (or methods) encapsulating the prompt, try again, etc behaviour. Secondly, I actually captured some heavily used internal state (such as the _capture and the _stream objects.

Another important reason : This is definitely _the_ part of the code where lies the solution of this issue so I really need to be able to isolate all its moving parts so I can be sure it will still work after I've rewritten it...

(my co-worker told me that I'm actually the kind of guy who mutate script in class sometimes a bit too early, I might be the problem in this case :))

bitprophet · 2012-08-29T18:28:03Z

That all sounds fair. Thanks again -- will wait to hear back. (I'm going on vaca for a few days as well, actually. Enjoy your own :))

vdboor · 2012-08-31T11:55:49Z

I'd love to know the results from this discussion! I also noticed that fabric can be slow, when a command generates a lot of output. I'd like to fetch a mysqldump from a remote server, and this takes:

38 seconds with fabric's run to localhost
0.2 seconds in a ssh command to localhost
0.08 seconds on when running mysqldump directly at the shell

Interestingly enough, the following also worked really well, within 1 second:

print local("ssh -q -i {env.key_filename} {env.user}@{host} -T '{cmd}' ".format(env=env, cmd="your-command", capture=True)

bitprophet · 2012-09-08T22:04:06Z

@vdboor Yea, it's not really a global "run within Fabric == SLOWMG", but rather that using the Python SSH library and/or the way we're buffering network traffic with it, is what's slow. Using local() is simply Python spinning up a new shell with your command in it -- it'll be exactly like running the ssh command in your bash shell, with really minor overhead at startup for the Python VM.

@octplane Eagerly awaiting your iteration on the pull request whenever you're back, & thanks again.

octplane · 2012-09-19T11:58:24Z

Ohai,

Pull request is there: #733

bitprophet · 2012-10-01T04:46:37Z

Merging into #733

Line prefix issues post-#715

bitprophet closed this as completed Oct 1, 2012

bitprophet mentioned this issue Oct 2, 2012

Line prefix issues post-#715 #745

Merged

bitprophet added a commit that referenced this issue Oct 4, 2012

Merge pull request #745 from fabric/745-doubled-up-line-prefixes

f9817ac

Line prefix issues post-#715

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fabric is slow because stdout/stderr read byte per byte #715

Fabric is slow because stdout/stderr read byte per byte #715

octplane commented Aug 23, 2012

bitprophet commented Aug 23, 2012

octplane commented Aug 24, 2012

octplane commented Aug 24, 2012

bitprophet commented Aug 27, 2012

octplane commented Aug 29, 2012

bitprophet commented Aug 29, 2012

vdboor commented Aug 31, 2012

bitprophet commented Sep 8, 2012

octplane commented Sep 19, 2012

bitprophet commented Oct 1, 2012

Fabric is slow because stdout/stderr read byte per byte #715

Fabric is slow because stdout/stderr read byte per byte #715

Comments

octplane commented Aug 23, 2012

(Useless) introduction and other might-be bugs

Origin of the issue

First missed attempt at fixing the issue

Actual Issue

bitprophet commented Aug 23, 2012

octplane commented Aug 24, 2012

octplane commented Aug 24, 2012

bitprophet commented Aug 27, 2012

octplane commented Aug 29, 2012

bitprophet commented Aug 29, 2012

vdboor commented Aug 31, 2012

bitprophet commented Sep 8, 2012

octplane commented Sep 19, 2012

bitprophet commented Oct 1, 2012