Support memoryview encoding (as a no-op) #1266

Cody-G · 2020-01-23T23:38:18Z

Pull Request check-list

Please make sure to review and check all of these items:

[(not sure) ] Does $ tox pass with this change (including linting)?
Does travis tests pass with this change (enable it first in your forked repo and wait for the travis build to finish)?
Is the new or changed code fully tested?
Is a documentation update included (if this change modifies existing APIs, or introduces new ones)?

NOTE: these things are not required to open a PR and can be done
afterwards / while the PR is open.

Description of change

Supports passing memoryviews to redis-py as outlined in #1265

andymccurdy · 2020-01-23T23:58:16Z

Looks like Py27 is having an issue with this.

Also can we move the test to the test_encoding.py module? I'd prefer a simple test that persists the a memoryview value to a Redis string and verifies a subsequent get() returns the correct sequence of bytes. No need to complicate it with Redis streams.

Cody-G · 2020-01-24T00:11:12Z

Looks like Py27 is having an issue with this.

Oops! I'm not well-practiced maintaining python version compatibility but I'll give it a shot.

Also can we move the test to the test_encoding.py module? I'd prefer a simple test that persists the a memoryview value to a Redis string and verifies a subsequent get() returns the correct sequence of bytes. No need to complicate it with Redis streams.

Yes this makes sense, I had a feeling it wasn't the best place for the test code.

…mands

andymccurdy · 2020-01-24T18:29:46Z

@Cody-G Does memoryview.tobytes() cause a memcpy of the underlying data? If so I thought that's what this was trying to avoid.

If memoryview.tobytes() does not copy the memory, then I'd argue that Encoder.encode() should special case memoryview objects and convert them to bytes.

If memoryview.tobytes() DOES copy the memory and b''.join((my_memory_view, ...)) DOES NOT copy the memory, then it seems like we'd only need to call .tobytes() in Python 2.7. In which case we could have a simple function in _compat.py that's a noop in Python 3 but calls .tobytes() in Python 2.

Cody-G · 2020-01-24T18:30:08Z

Tests are now passing on all Python versions. I'm not sure why, but the python 2.7 tests seem to execute this branch but I think the other python versions tests do not. The problem occurs when we try to do a string join on a memoryview. I see that happening in other places in the code, like this line and this line, so my concern is that in many cases we'll end up incurring a memory copy anyway, and we won't get the benefit of using a memoryview. My main goal is that, at least for large arguments, they can make it all the way to the socket sendall without being copied in any way. Would it be acceptable to reduce the number of string joins further? I realize that all of this chunking/joining was implemented to improve network performance, and I don't want to compromise that, but I wonder if we can compromise somehow.

Cody-G · 2020-01-24T18:30:35Z

Oh I think we posted at the same time, let me read your post.

Cody-G · 2020-01-24T18:32:44Z

Does memoryview.tobytes() cause a memcpy of the underlying data?

Yes it does. With my last commit I compromised on that for small objects (less than the hard-coded size limit)

b''.join((my_memory_view, ...)) DOES NOT copy the memory

I was pretty sure this does copy the memory, but let me double check that

Cody-G · 2020-01-24T18:44:35Z

b''.join((my_memory_view, ...)) does in fact make a copy. It seems if we want to use memoryviews and send small chunks over the socket the recommendation is to use memoryview slices, which are zero-copy. That makes me wonder if redis-py could just wrap all bytestrings in memoryviews (because it's free anyway) and then always handle the chunking with slices. We could still join very small arguments at the cost of a copy, but those small arguments aren't the problem from a memory standpoint anyway.

andymccurdy · 2020-01-30T21:50:27Z

It looks like this PR always calls memoryview.tobytes() when it encounters a memoryview. Isn't this what we're trying to avoid?

It seems like the way to do this is to make a special case for memoryview in the pack_command loop and append the memoryview to the output list without converting it. Something like:

for arg in imap(self.encoder.encode, args):
    arg_length = len(arg)
    if len(buff) > buffer_cutoff or arg_length > buffer_cutoff or isinstance(arg, memoryview):
        # same code that exists currently

This will append the current buffer plus the length of the memoryview to the output list, and then append the memoryview directly without a copy.

Note that I haven't actually tested this. If you could that would be quite helpful. Thanks!

Cody-G · 2020-02-01T01:44:03Z

It looks like this PR always calls memoryview.tobytes() when it encounters a memoryview. Isn't this what we're trying to avoid?

Indeed it is, and this PR in its current form fails to avoid it. I wasn't sure how much to respect the current optimization of joining small strings before sending over the socket. In our company fork I took the more aggressive approach of never converting memoryviews to bytes, which is I think what you're suggesting. If you're ok with that approach I can update this PR / make a new one next week. Would definitely like to get this in somehow so we don't need to maintain our own fork. Thanks for your attention on this!

Cody-G · 2020-02-08T21:24:02Z

Superseded by #1285

Support memoryview encoding (as a no-op)

15afe7e

convert memoryviews to bytes when they get string-joined in short com…

09bf90f

…mands

Cody-G mentioned this pull request Feb 8, 2020

Support memoryview encoding/decoding as a no-op, calling tobytes() on… #1285

Closed

4 tasks

Cody-G closed this Feb 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support memoryview encoding (as a no-op) #1266

Support memoryview encoding (as a no-op) #1266

Uh oh!

Cody-G commented Jan 23, 2020 •

edited

Loading

Uh oh!

andymccurdy commented Jan 23, 2020

Uh oh!

Cody-G commented Jan 24, 2020

Uh oh!

andymccurdy commented Jan 24, 2020

Uh oh!

Cody-G commented Jan 24, 2020

Uh oh!

Cody-G commented Jan 24, 2020

Uh oh!

Cody-G commented Jan 24, 2020 •

edited

Loading

Uh oh!

Cody-G commented Jan 24, 2020

Uh oh!

andymccurdy commented Jan 30, 2020

Uh oh!

Cody-G commented Feb 1, 2020 •

edited

Loading

Uh oh!

Cody-G commented Feb 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support memoryview encoding (as a no-op) #1266

Support memoryview encoding (as a no-op) #1266

Uh oh!

Conversation

Cody-G commented Jan 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request check-list

Description of change

Uh oh!

andymccurdy commented Jan 23, 2020

Uh oh!

Cody-G commented Jan 24, 2020

Uh oh!

andymccurdy commented Jan 24, 2020

Uh oh!

Cody-G commented Jan 24, 2020

Uh oh!

Cody-G commented Jan 24, 2020

Uh oh!

Cody-G commented Jan 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cody-G commented Jan 24, 2020

Uh oh!

andymccurdy commented Jan 30, 2020

Uh oh!

Cody-G commented Feb 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cody-G commented Feb 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Cody-G commented Jan 23, 2020 •

edited

Loading

Cody-G commented Jan 24, 2020 •

edited

Loading

Cody-G commented Feb 1, 2020 •

edited

Loading