New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize pty buffering and searching #464

Merged
merged 1 commit into from Feb 8, 2018

Conversation

Projects
None yet
4 participants
@ryanpetrello
Contributor

ryanpetrello commented Jan 15, 2018

Python strings are slow and expensive as buffers because they're immutable;
replace the output buffer with a StringIO/BytesIO object

see: #438
see: #385
see: #352

@ryanpetrello ryanpetrello force-pushed the ryanpetrello:optimized-read-buffer branch 6 times, most recently from d6212e3 to 2624bdc Jan 15, 2018

@ryanpetrello

This comment has been minimized.

Contributor

ryanpetrello commented Jan 16, 2018

Example test case that illustrates the performance improvement:

    def test_large_stdout_stream(self):
        e = pexpect.spawn('openssl rand -base64 {}'.format(1024*1024*25), searchwindowsize=1000, timeout=None)  # print ~25MB
        e.expect(['Password:', pexpect.EOF, pexpect.TIMEOUT])

Before (master branch):

$ time py.test -k large_stdout_stream
140.38s user 178.81s system 96% cpu 5:29.90 total

After:

$ time py.test -k large_stdout_stream
2.74s user 0.89s system 51% cpu 6.984 total
@ryanpetrello

This comment has been minimized.

Contributor

ryanpetrello commented Jan 16, 2018

@noahspurrier @jquast @takluyver as best I can tell, you all are the current maintainers of pexpect? I'd love to get your thoughts on this - I'm working on another open source project that makes use of pexpect, and this resolves a performance issue we've run into.

@wwitzel3

LGTM

@ryanpetrello ryanpetrello force-pushed the ryanpetrello:optimized-read-buffer branch from 78390f8 to 48853bc Jan 16, 2018

@takluyver

This comment has been minimized.

Member

takluyver commented Feb 5, 2018

Thanks. I think this makes sense, but I also think that there are almost certainly people relying on spawn.buffer being a bytes/string object, rather than a file-like object. I've already learned the hard way that with code as old as pexpect, it's hard to make any change without breaking things.

Is it practical to make a new private attribute like spawn._buffer_io, and then make buffer a property which returns spawn._buffer_io.getvalue()? That way the code in pexpect can do things efficiently, but code that gets the buffer as a string should still work.

(I'd probably also implement setting spawn.buffer = '...' - hopefully there's less code that does that, but it's not much extra work.)

@ryanpetrello ryanpetrello force-pushed the ryanpetrello:optimized-read-buffer branch from a684cf3 to fb9dfd2 Feb 5, 2018

@ryanpetrello

This comment has been minimized.

Contributor

ryanpetrello commented Feb 5, 2018

@takluyver Good point re: backwards compatibility - I've uploaded another commit that I think should address this.

@ryanpetrello ryanpetrello force-pushed the ryanpetrello:optimized-read-buffer branch 2 times, most recently from f57e6c7 to 5038b5f Feb 5, 2018

spawn.buffer = incoming
elif self.searchwindowsize:
spawn._buffer = spawn.buffer_type(window)

This comment has been minimized.

@takluyver

takluyver Feb 5, 2018

Member

I think this could do the wrong thing in some cases. I was just testing the BytesIO/StringIO constructors because I was going to suggest getting rid of the .write() on line 32, and I noticed that if you pass data into the constructor, it leaves the cursor at the beginning, so .tell() returns 0, and writing overwrites the data already there.

import io
s = io.StringIO('abcd')
s.tell()  # -> 0
s.write('zz')
s.getvalue()  # -> 'zzcd'

This comment has been minimized.

@ryanpetrello

ryanpetrello Feb 5, 2018

Contributor

Which constructor? I'm not sure I follow this comment.

This comment has been minimized.

@ryanpetrello

ryanpetrello Feb 5, 2018

Contributor

Oh, are you suggesting that I should instead do:

spawn._buffer = spawn.buffer_type()
spawn._buffer.write(window)

?

This comment has been minimized.

@takluyver

takluyver Feb 5, 2018

Member

Yup, that's what I meant. That way, writing more data with _buffer.write() afterwards will add it to the end of the buffer, rather than overwriting existing data.

@@ -263,25 +251,16 @@ def __str__(self):
ss = list(zip(*ss))[1]
return '\n'.join(ss)
def search(self, buffer, freshlen, searchwindowsize=None):

This comment has been minimized.

@takluyver

takluyver Feb 5, 2018

Member

I'm not sure that we can get rid of these parameters so easily - backwards compatibility again.

@@ -140,6 +147,15 @@ def _coerce_send_string(self, s):
return s.encode('utf-8')
return s
def get_buffer(self):

This comment has been minimized.

@takluyver

takluyver Feb 5, 2018

Member

Let's make these private methods (_ prefix), so people don't start using them directly and cause us another backwards compatibility headache.

@@ -1,3 +1,8 @@
try:
from StringIO import StringIO

This comment has been minimized.

@takluyver

takluyver Feb 5, 2018

Member

I think we should use io.StringIO in all cases, because that's strict about unicode, whereas StringIO.StringIO isn't.

This comment has been minimized.

@takluyver

takluyver Feb 5, 2018

Member

(By 'in all cases', I mean on Python 2 as well as Python 3 - of course we still need to use io.BytesIO when we're handling bytes)

@ryanpetrello ryanpetrello force-pushed the ryanpetrello:optimized-read-buffer branch from 5038b5f to a19007a Feb 5, 2018

@@ -157,8 +168,8 @@ def __str__(self):
ss = list(zip(*ss))[1]
return '\n'.join(ss)
def search(self, buffer, freshlen, searchwindowsize=None):
'''This searches 'buffer' for the first occurrence of one of the search
def search(self, buff, freshlen, searchwindowsize=None):

This comment has been minimized.

@ryanpetrello

ryanpetrello Feb 5, 2018

Contributor

@takluyver related to your comment, I just reverted these search methods - the changes I made ended up being unnecessary.

This comment has been minimized.

@takluyver

takluyver Feb 5, 2018

Member

Great, thanks. Could you also revert buff back to buffer in the parameters? Any parameter in Python can be passed as a keyword argument, so people might be relying on the name.

This comment has been minimized.

@ryanpetrello

ryanpetrello Feb 5, 2018

Contributor

Good point, I'll fix that.

@@ -140,6 +143,15 @@ def _coerce_send_string(self, s):
return s.encode('utf-8')
return s
def _get_buffer(self):

This comment has been minimized.

@ryanpetrello

ryanpetrello Feb 5, 2018

Contributor

@takluyver private methods now

@@ -1,3 +1,4 @@
from io import StringIO, BytesIO

This comment has been minimized.

@ryanpetrello

ryanpetrello Feb 5, 2018

Contributor

@takluyver import fixed

@ryanpetrello ryanpetrello force-pushed the ryanpetrello:optimized-read-buffer branch from a19007a to 068434a Feb 5, 2018

@ryanpetrello

This comment has been minimized.

Contributor

ryanpetrello commented Feb 5, 2018

@takluyver okay, I've posted a new commit that I think addresses the remaining feedback.

@takluyver

Thanks, this looks good to me. I'll give @jquast and @MountainRider a chance to have a look, but otherwise I'm happy to merge it in a couple of days (feel free to remind me if I forget).

self._buffer = self.buffer_type()
self._buffer.write(value)
buffer = property(_get_buffer, _set_buffer)

This comment has been minimized.

@takluyver

takluyver Feb 5, 2018

Member

Could we have a comment to remind anyone working on the code that this interface as a string is required for backwards compatibility?

optimize pty buffering and searching
Python strings are slow and expensive as buffers because they're
immutable; replace the output buffer with a StringIO/BytesIO
object

see: #438

@ryanpetrello ryanpetrello force-pushed the ryanpetrello:optimized-read-buffer branch from 068434a to fd7332f Feb 5, 2018

@ryanpetrello

This comment has been minimized.

Contributor

ryanpetrello commented Feb 5, 2018

Sounds great, @takluyver. Thanks for taking the time to review this!

@ryanpetrello

This comment has been minimized.

Contributor

ryanpetrello commented Feb 8, 2018

Hey @takluyver,

Any chance this is ready to merge?
Thanks!

@takluyver takluyver merged commit 6ba1fbb into pexpect:master Feb 8, 2018

1 of 2 checks passed

coverage/coveralls Coverage decreased (-0.005%) to 88.743%
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@takluyver

This comment has been minimized.

Member

takluyver commented Feb 8, 2018

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment