Skip to content

Conversation

@Cody-G
Copy link
Contributor

@Cody-G Cody-G commented Jan 23, 2020

Pull Request check-list

Please make sure to review and check all of these items:

  • [(not sure) ] Does $ tox pass with this change (including linting)?
  • Does travis tests pass with this change (enable it first in your forked repo and wait for the travis build to finish)?
  • Is the new or changed code fully tested?
  • Is a documentation update included (if this change modifies existing APIs, or introduces new ones)?

NOTE: these things are not required to open a PR and can be done
afterwards / while the PR is open.

Description of change

Supports passing memoryviews to redis-py as outlined in #1265

@andymccurdy
Copy link
Contributor

Looks like Py27 is having an issue with this.

Also can we move the test to the test_encoding.py module? I'd prefer a simple test that persists the a memoryview value to a Redis string and verifies a subsequent get() returns the correct sequence of bytes. No need to complicate it with Redis streams.

@Cody-G
Copy link
Contributor Author

Cody-G commented Jan 24, 2020

Looks like Py27 is having an issue with this.

Oops! I'm not well-practiced maintaining python version compatibility but I'll give it a shot.

Also can we move the test to the test_encoding.py module? I'd prefer a simple test that persists the a memoryview value to a Redis string and verifies a subsequent get() returns the correct sequence of bytes. No need to complicate it with Redis streams.

Yes this makes sense, I had a feeling it wasn't the best place for the test code.

@andymccurdy
Copy link
Contributor

@Cody-G Does memoryview.tobytes() cause a memcpy of the underlying data? If so I thought that's what this was trying to avoid.

If memoryview.tobytes() does not copy the memory, then I'd argue that Encoder.encode() should special case memoryview objects and convert them to bytes.

If memoryview.tobytes() DOES copy the memory and b''.join((my_memory_view, ...)) DOES NOT copy the memory, then it seems like we'd only need to call .tobytes() in Python 2.7. In which case we could have a simple function in _compat.py that's a noop in Python 3 but calls .tobytes() in Python 2.

@Cody-G
Copy link
Contributor Author

Cody-G commented Jan 24, 2020

Tests are now passing on all Python versions. I'm not sure why, but the python 2.7 tests seem to execute this branch but I think the other python versions tests do not. The problem occurs when we try to do a string join on a memoryview. I see that happening in other places in the code, like this line and this line, so my concern is that in many cases we'll end up incurring a memory copy anyway, and we won't get the benefit of using a memoryview. My main goal is that, at least for large arguments, they can make it all the way to the socket sendall without being copied in any way. Would it be acceptable to reduce the number of string joins further? I realize that all of this chunking/joining was implemented to improve network performance, and I don't want to compromise that, but I wonder if we can compromise somehow.

@Cody-G
Copy link
Contributor Author

Cody-G commented Jan 24, 2020

Oh I think we posted at the same time, let me read your post.

@Cody-G
Copy link
Contributor Author

Cody-G commented Jan 24, 2020

Does memoryview.tobytes() cause a memcpy of the underlying data?

Yes it does. With my last commit I compromised on that for small objects (less than the hard-coded size limit)

b''.join((my_memory_view, ...)) DOES NOT copy the memory

I was pretty sure this does copy the memory, but let me double check that

@Cody-G
Copy link
Contributor Author

Cody-G commented Jan 24, 2020

b''.join((my_memory_view, ...)) does in fact make a copy. It seems if we want to use memoryviews and send small chunks over the socket the recommendation is to use memoryview slices, which are zero-copy. That makes me wonder if redis-py could just wrap all bytestrings in memoryviews (because it's free anyway) and then always handle the chunking with slices. We could still join very small arguments at the cost of a copy, but those small arguments aren't the problem from a memory standpoint anyway.

@andymccurdy
Copy link
Contributor

It looks like this PR always calls memoryview.tobytes() when it encounters a memoryview. Isn't this what we're trying to avoid?

It seems like the way to do this is to make a special case for memoryview in the pack_command loop and append the memoryview to the output list without converting it. Something like:

for arg in imap(self.encoder.encode, args):
    arg_length = len(arg)
    if len(buff) > buffer_cutoff or arg_length > buffer_cutoff or isinstance(arg, memoryview):
        # same code that exists currently

This will append the current buffer plus the length of the memoryview to the output list, and then append the memoryview directly without a copy.

Note that I haven't actually tested this. If you could that would be quite helpful. Thanks!

@Cody-G
Copy link
Contributor Author

Cody-G commented Feb 1, 2020

It looks like this PR always calls memoryview.tobytes() when it encounters a memoryview. Isn't this what we're trying to avoid?

Indeed it is, and this PR in its current form fails to avoid it. I wasn't sure how much to respect the current optimization of joining small strings before sending over the socket. In our company fork I took the more aggressive approach of never converting memoryviews to bytes, which is I think what you're suggesting. If you're ok with that approach I can update this PR / make a new one next week. Would definitely like to get this in somehow so we don't need to maintain our own fork. Thanks for your attention on this!

@Cody-G
Copy link
Contributor Author

Cody-G commented Feb 8, 2020

Superseded by #1285

@Cody-G Cody-G closed this Feb 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants